From a2bd86f4dba1b3769f22a617d8abde736fb97004 Mon Sep 17 00:00:00 2001 From: merobi-hub Date: Mon, 8 Apr 2024 16:07:33 -0400 Subject: [PATCH 1/2] Update with new export. Signed-off-by: merobi-hub --- channel/boston-meetup/index.html | 108 + channel/dagster-integration/index.html | 8 +- channel/dev-discuss/index.html | 13074 ++++++++- channel/general/index.html | 22287 +++++++++++++++- channel/github-discussions/index.html | 690 + channel/github-notifications/index.html | 1098 + channel/mark-grover/index.html | 33 + channel/open-lineage-plus-bacalhau/index.html | 26 + channel/providence-meetup/index.html | 71 + channel/sf-meetup/index.html | 16 +- .../index.html | 547 +- index.html | 22287 +++++++++++++++- 12 files changed, 58892 insertions(+), 1353 deletions(-) diff --git a/channel/boston-meetup/index.html b/channel/boston-meetup/index.html index 9bc8e7f..33497b0 100644 --- a/channel/boston-meetup/index.html +++ b/channel/boston-meetup/index.html @@ -529,6 +529,114 @@

Group Direct Messages

+ + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-02-28 13:18:52
+
+

Is there a linkedin post about the Boston meetup I can share? or should i make one?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 15:04:49
+
+

Feel free to make one! There isn't one yet

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rajesh + (rajesh.kaveti@bnymellon.com) +
+
2024-03-21 07:57:18
+
+

@Rajesh has joined the channel

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rajesh + (rajesh.kaveti@bnymellon.com) +
+
2024-03-21 07:57:42
+
+

New to Boston Meetup - Hello all

+ + + +
+
+
+
+ + + diff --git a/channel/dagster-integration/index.html b/channel/dagster-integration/index.html index 207bab4..cec9ca1 100644 --- a/channel/dagster-integration/index.html +++ b/channel/dagster-integration/index.html @@ -1112,11 +1112,15 @@

Group Direct Messages

One final question - should we make the dagster unit test job “required” in the ci and how can that be configured?

diff --git a/channel/dev-discuss/index.html b/channel/dev-discuss/index.html index 237ae38..26ccc1f 100644 --- a/channel/dev-discuss/index.html +++ b/channel/dev-discuss/index.html @@ -1149,11 +1149,15 @@

Group Direct Messages

is it time to support hudi?

@@ -2452,11 +2456,15 @@

Group Direct Messages

The full project history is now available at https://openlineage.github.io/slack-archives/. Check it out!

@@ -3357,6 +3365,55 @@

Group Direct Messages

+ +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2023-11-29 11:39:11
+
+ + +
+ + + + + + + + + +
+ + +
+ ❤️ Willy Lulciuc +
+ +
+ 🔥 Willy Lulciuc +
+ +
+ 🚀 Willy Lulciuc +
+ +
+
+
+
+ + @@ -4439,12 +4496,12 @@

Group Direct Messages

Feedback requested on the newsletter:

- + - - + @@ -4562,12 +4619,12 @@

Group Direct Messages

*Thread Reply:* it’s open source, should we consider testing it out?

- + - - + @@ -8169,6 +8226,216 @@

Group Direct Messages

+
+
+ + + + +
+ +
Paul Wilson Villena + (pgvillena@gmail.com) +
+
2024-03-06 23:53:51
+
+

*Thread Reply:* Hi All, I am one of the owners of this repo and working to update this to work with MWAA 2.8.1, with apache-airflow-providers-openlineage==1.4.0. I am facing an issue with my set-up. I am using Redshift SQL as a sample use-case for this and getting an error relating to the Default Extractor. Haven't really looked at this at much detail yet but wondering if you have thoughts? I just updated the env variables to use: AIRFLOWOPENLINEAGETRANSPORT and AIRFLOWOPENLINEAGENAMESPACE and changed operator from PostgresOperator to SQLExecuteQueryOperator. +[2024-03-07 03:52:55,496] Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.base.DefaultExtractor object at 0x7fc4aa1e3950> - section/key [openlineage/disabled_for_operators] not found in config task_type=SQLExecuteQueryOperator airflow_dag_id=rs_source_to_staging task_id=task_insert_event_data airflow_run_id=manual__2024-03-07T03:52:11.634313+00:00 +[2024-03-07 03:52:55,498] section/key [openlineage/config_path] not found in config +[2024-03-07 03:52:55,498] section/key [openlineage/config_path] not found in config +[2024-03-07 03:52:55,499] Executing: + insert into event + SELECT eventid, venueid, catid, dateid, eventname, starttime::TIMESTAMP + FROM s3_datalake.event;

+
+ + + + + + + +
+
Stars
+ 8 +
+ +
+
Language
+ Python +
+ + + + + + + + +
+ + + +
+ 🙌 Shubham Mehta +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-08 06:04:50
+
+

*Thread Reply:* I'll look into it 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-08 06:57:39
+
+

*Thread Reply:* @Paul Wilson Villena It looks like a small mistake in the OL, that I'll fix in the next version - we missed adding a callback there, and getting the airflow configuration raises error when disabled_for_operators is not defined in the airflow.cfg file / the env variable. For now it should help to simply add the <a href="https://airflow.apache.org/docs/apache-airflow-providers-openlineage/1.4.0/configurations-ref.html#id1">[openlineage]</a> section to airflow.cfg, and set disabled_for_operators="" , or just export AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS="" ,

+ + + +
+ 🙌 Paul Wilson Villena +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-08 07:56:15
+
+

*Thread Reply:* Will be released in the next provider version: https://github.com/apache/airflow/pull/37994

+ + + +
+ 🙌 Jakub Dardziński, Paul Wilson Villena +
+ +
+ 🙏 Shubham Mehta, Paul Wilson Villena +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paul Wilson Villena + (pgvillena@gmail.com) +
+
2024-03-09 07:56:31
+
+

*Thread Reply:* Hi @Kacper Muda it seems I need to also set this: Otherwise this error persists: +section/key [openlineage/config_path] not found in config +os.environ["AIRFLOW__OPENLINEAGE__CONFIG_PATH"]=""

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-11 03:57:07
+
+

*Thread Reply:* Yes, sorry for missing that. I fixed in the code and forgot to mention it. If You were to not use AIRFLOW__OPENLINEAGE__TRANSPORT You'd have to set it to empty string as well, as it's missing the same fallback 🙂

+ + + +
+
+
+
+ + + + +
@@ -9318,12 +9585,12 @@

Group Direct Messages

*Thread Reply:* I see it too:

- + - - + @@ -9461,12 +9728,12 @@

Group Direct Messages

Spotted!

- + - - + @@ -9508,12 +9775,12 @@

Group Direct Messages

*Thread Reply:* a moment earlier it makes more context

- + - - + @@ -11870,12 +12137,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

Due to this decision, we can encounter duplicate runid if we delete the DagRun from the database, because the execution_date remains the same. If I run a backfill job for yesterday, then delete it and run it again, I get the same ids. I'm trying to understand the rationale behind this choice so we can determine whether it's a bug or a feature. 😉

- + - - + @@ -12717,6 +12984,43 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

+ +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-01-16 05:30:24
+
+ + + + + +
+
+
+
+ + @@ -14182,12 +14486,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:*

- + - - + @@ -14438,13 +14742,50 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

- -
-
+
- + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-01-23 11:34:51
+ +
+
+
+ + + + + +
+
+ +
@@ -14844,7 +15185,7 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:* @Harel Shein thanks for the suggestion. Lmk if there's a better way to do this, but here's a link to Google's visualizations: https://docs.google.com/forms/d/1j1SyJH0LoRNwNS1oJy0qfnDn_NPOrQw_fMb7qwouVfU/viewanalytics. And a .csv is attached. Would you use this link on the page or link to a spreadsheet instead?

- + @@ -14974,6 +15315,47 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

+ +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-01-24 16:38:11
+
+ + + + + +
+ 😍 Julien Le Dem, Paweł Leszczyński, Maciej Obuchowski, Harel Shein, Ross Turk +
+ +
+
+
+
+ + @@ -15023,12 +15405,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

Thanks for any feedback on the Mailchimp version of the newsletter special issue before it goes out on Monday:

- + - - + @@ -15607,12 +15989,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

Decathlon showed part of one of their graphs last night

- + - - + @@ -15680,12 +16062,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:* some metrics too

- + - - + @@ -16764,12 +17146,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:*

- + - - + @@ -18662,6 +19044,166 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

+
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-05 15:28:11
+
+

*Thread Reply:* it got merged 👀

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-05 15:31:33
+
+

*Thread Reply:* amazing feedback on a 10k line PR 😅

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-05 15:32:09
+
+

*Thread Reply:* maybe they have policy that feedback starts from 10k lines

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-05 15:32:15
+
+

*Thread Reply:* it wasn’t enough

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-05 15:32:20
+
+

*Thread Reply:* 🙈

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-05 15:32:32
+
+

*Thread Reply:* too big to review, LGTM

+ + + +
+ ☝️ Jakub Dardziński +
+ +
+
+
+
+ + + + +
@@ -18678,12 +19220,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

I just noticed this. shared should not have a dependency on spark. 👀

- + - - + @@ -19635,12 +20177,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:* also 🙂

- + - - + @@ -22571,12 +23113,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:* People still love to use 2.4.8 🙂

- + - - + @@ -23060,12 +23602,12 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

*Thread Reply:* not sure it did exactly what we want but probably okay for now

- + - - + @@ -23515,6 +24057,58 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

+
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-02-28 20:28:49
+
+

*Thread Reply:* to me the risk is more to introduce vulnerabilities/backdoors in the OpenLineage released artifact through pushing a cached image that modifies the result of the build.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-02-28 20:30:29
+
+

*Thread Reply:* The idea of saving the image signature in the repo is that you can not use a new image in the build without creating a new commit and traceability.

+ + + +
+
+
+
+ + + + +
@@ -24196,6 +24790,12378 @@

ENV OPENLINEAGE_URL=http://foo.bar/```

+ + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-27 11:16:28
+
+

gotta skip today meeting. I hope to see you all next week!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-02-27 12:11:00
+
+

The meetup I mentioned about OpenLineage/OpenTelemetry: https://x.com/J_/status/1565162740246671360 +I speak in English but other two speakers speak in Hebrew

+
+
X (formerly Twitter)
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 🙏 Willy Lulciuc +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-02-27 13:49:21
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-02-28 10:20:23
+
+

*Thread Reply:* thanks for sharing that, that otel to ol comparison is going to be very useful for me today :)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 13:18:03
+
+

Could use another pair of eyes on this month's newsletter draft if anyone has time today

+ + + + +
+ 🙌 Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-02-28 15:00:46
+
+

*Thread Reply:* LGTM 🙂

+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-05 14:07:46
+
+

Hey, I created new Airflow AIP. It proposes instrumenting Airflow Hooks and Object Storage to collect dataset updates automatically, to allow gathering lineage from PythonOperator and custom operators. +Feel free to comment on Confluence https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-62+Getting+Lineage+from+Hook+Instrumentation +or on Airflow mailing list: https://lists.apache.org/thread/5chxcp0zjcx66d3vs4qlrm8kl6l4s3m2

+ + + +
+ 🙌 Kacper Muda, Harel Shein, Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-06 05:25:42
+
+

Hey, does anyone want to add anything here (PR that adds AWS MSK IAM transport)? It looks like it's ready to be merged.

+ + + +
+ :gh_approved: Maciej Obuchowski +
+ +
+ :gh_merged: Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 10:14:34
+
+

did we miss a step in publishing 1.9.1? going https://search.maven.org/remote_content?g=io.openlineage&a=openlineage-spark&v=LATEST|here gives me the 1.8 release

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 10:17:30
+
+

*Thread Reply:* oh, this might be related to having 2 scala versions now, because I can see the 1.9.1 artifacts

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-06 10:17:35
+
+

*Thread Reply:* yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 10:17:48
+
+

*Thread Reply:* we may need to fix the docs then https://openlineage.io/docs/integrations/spark/quickstart/quickstart_databricks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-06 10:18:22
+
+

*Thread Reply:* another place 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 10:19:01
+
+

*Thread Reply:* yup

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-06 10:19:44
+
+

*Thread Reply:* https://github.com/OpenLineage/docs/pull/299

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 10:22:25
+
+

*Thread Reply:* thx :gh_merged:

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-06 13:33:45
+
+

Hi, here's a tentative agenda for next week's TSC (on Wednesday at 9:30 PT):

+ +
  1. Announcements including @Peter Huang's election, Kafka Summit talk, Data Council panel, Boston meetup
  2. Recent release 1.9.1 highlights
  3. Expanded Scala support in Spark overview @Damien Hawes
  4. Circuit breaker in Spark & Flink, built-in lineage in Spark @Paweł Leszczyński
  5. Discussion items
  6. Open discussion +Am I forgetting anything? Have a discussion item or want to do a demo? 🙂 Let me know. I'll also make a slide deck whether or not I can join next week and share it here. Reminders will go out today, and I believe links, meeting info and invites are all up to date. Please let me know if you spot incorrect meeting info anywhere.
  7. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 13:46:49
+
+

*Thread Reply:* I thought @Paweł Leszczyński wanted to present?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-06 13:51:06
+
+

*Thread Reply:* What was the topic? Protobuf or built-in lineage maybe? Or the many docs improvements lately?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 13:53:31
+
+

*Thread Reply:* I think so? https://github.com/OpenLineage/OpenLineage/pull/2272

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-06 13:55:44
+
+

*Thread Reply:* Imagine there are lots of folks who would be interested in a presentation on that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 13:58:15
+
+

*Thread Reply:* I think so too 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 02:22:25
+
+

*Thread Reply:* There two things worth presenting: circuit breaker +/or built-in lineage (once it gets merged).

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 09:08:15
+
+

*Thread Reply:* updating the agenda

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:07:06
+
+

is there a reason why facet objects have _schemaURL property but BaseEvent has schemaURL?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-06 16:07:34
+
+

*Thread Reply:* yeah, we use _ to avoid naming conflicts in a facet

+ + + +
+ 👍 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:07:34
+
+

*Thread Reply:* same goes for producer

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:08:18
+
+

*Thread Reply:* Facets have user defined fields. So all base fields are prefixed

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:08:27
+
+

*Thread Reply:* Base events do not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-06 16:08:30
+
+

*Thread Reply:* it should be a made more clear… recently ran into the issue when validating OL events

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:09:38
+
+

*Thread Reply:* it might be another missing point but we set _producer in BaseFacet: +def __attrs_post_init__(self) -&gt; None: + self._producer = PRODUCER +but we don’t do that for producer in BaseEvent

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:09:52
+
+

*Thread Reply:* is this supposed to be like that?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:09:57
+
+

*Thread Reply:* I’m kinda lost 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:10:43
+
+

*Thread Reply:* We should set producer in baseevent as well

+ + + +
+ ☝️ Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:11:35
+
+

*Thread Reply:* The idea is the base event might be produced by the spark integration but the facet might be produced by iceberg library

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:13:02
+
+

*Thread Reply:* > The idea is the base event might be produced by the spark integration but the facet might be produced by iceberg library +right, it doesn’t require adding _ , it just helps in making the difference

+ +

and also this reason too: +> Facets have user defined fields. So all base fields are prefixed +> Base events do not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:13:34
+
+

*Thread Reply:* Since users can create custom facets with whatever fields we just tell Them that “_**” is reserved.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:13:55
+
+

*Thread Reply:* So the underscore prefix is a mechanism specific to facets

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:14:04
+
+

*Thread Reply:* 👍

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:15:19
+
+

*Thread Reply:* last question: +we don’t want to block users from setting their own _producerfield? it seems the only way now is to use openlineage.client.facet.set_producer method to override default, you can’t just do RunEvent(…, _producer='my_own')

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:17:11
+
+

*Thread Reply:* The idea is the producer identifies the code that generates the metadata. So you set it once and all the facets you generate have the same

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:17:54
+
+

*Thread Reply:* mhm, probably you don’t need to use several producers (at least) per Python module

+ + + +
+ 👍 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:18:09
+
+

*Thread Reply:* Yep

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:18:39
+
+

*Thread Reply:* In airflow each provider should have its own for the facets they produce

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:18:43
+
+

*Thread Reply:* just searched for set_producer in current docs - no results 😨

+ + + +
+ 😅 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:19:55
+
+

*Thread Reply:* a number of things will get to the right track after I’m done with generating code 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-06 16:20:54
+
+

*Thread Reply:* Thanks for looking into that. If you can fix the doc by adding a paragraph about that, that would be helpful

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:21:38
+
+

*Thread Reply:* I can create an issue at least 😂

+ + + +
+ 👍 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-06 16:23:44
+
+

*Thread Reply:* there you go: +https://github.com/OpenLineage/docs/issues/300 +if I missed something please comment

+
+ + + + + + + +
+
Assignees
+ <a href="https://github.com/JDarDagran">@JDarDagran</a> +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-06 17:24:05
+
+

I feel like our getting started with openlineage page is mostly a getting started with Marquez page. but I'm also not sure what should be there otherwise.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 09:00:37
+
+

*Thread Reply:* https://openlineage.io/docs/guides/spark ?

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 09:03:54
+
+

*Thread Reply:* Unfortunately it's probably not that "quick" given the setup required..

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 09:04:30
+
+

*Thread Reply:* Maybe better? https://openlineage.io/docs/integrations/spark/quickstart/quickstart_local

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-07 12:21:18
+
+

*Thread Reply:* yeah, that's where I was struggling as well. should our quickstart be platform specific? that also feels strange.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-07 10:35:46
+
+

Quick question, for the spark.openlineage.facets.disabled property, why do we need to include [;] in the value? Why can't we use , to act as the delimiter? Why do we need [ and ] to enclose the string?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 13:22:42
+
+

*Thread Reply:* There was some concrete reason AFAIK right @Paweł Leszczyński?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 02:23:02
+
+

*Thread Reply:* We do have a logic that converts Spark conf entries to OpenLineageYaml without a need to understand its content. I think [] was added for this reason to know that Spark conf entry has to be translated into an array.

+ +

Initially disabled facets were just separated by ; . Why not a comma? I don't remember if there was any problem with this.

+ +

https://github.com/OpenLineage/OpenLineage/pull/1271/files -> this PR introduced it

+ +

https://github.com/OpenLineage/OpenLineage/blob/1.9.1/integration/spark/app/src/main/java/io/openlineage/spark/agent/ArgumentParser.java#L152 -> this code check if spark conf value is of array type

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-07 15:27:02
+
+

Hi team, do we have any proposal or previous discussion of Trino OpenLineage integration?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 15:30:33
+
+

*Thread Reply:* There is old third-party integration: https://github.com/takezoe/trino-openlineage

+ +

It has right idea to use EventListener, but I can't vouch if it works

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-07 15:34:18
+
+

*Thread Reply:* Thanks. We are investigating the integration in our org. It will be a good start point 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-07 15:38:13
+
+

*Thread Reply:* I think the ideal solution would be to use EventListener. So far we only have very basic integration in Airflow's TrinoOperator

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-07 15:39:24
+
+

*Thread Reply:* The only thing I haven't really checked out what are real possibilities for EventListener in terms of catalog details discovery, e.g. what's database connection for the catalog.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-07 17:00:41
+
+

*Thread Reply:* Thanks for calling out this. We will evaluate and post some observation in the thread.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Alok + (a_prusty@apple.com) +
+
2024-03-07 18:54:22
+
+

*Thread Reply:* Thanks Peter +Hey Maciej/Jakub +Could you please share the process to follow in terms of contributing a Trino open lineage integration. (Design doc and issue ?)

+ +

There was an issue for trino integration but it was closed recently. +https://github.com/OpenLineage/OpenLineage/issues/164

+
+ + + + + + + +
+
Labels
+ integration/trino +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 04:40:51
+
+

*Thread Reply:* It would be great to see design doc and maybe some POC if possible. I've reopened the issue for you.

+ +

If you get agreement around the design I don't think there are more formal steps needed, but maybe @Julien Le Dem has other idea

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 06:36:25
+
+

*Thread Reply:* Trino has their plugins directory btw: +https://github.com/trinodb/trino/tree/master/plugin +including event listeners like: https://github.com/trinodb/trino/tree/master/plugin/trino-mysql-event-listener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Alok + (a_prusty@apple.com) +
+
2024-03-08 13:40:01
+
+

*Thread Reply:* Thanks Maciej and Jakub +Yes the integration will be done with Trino’s event listener framework that has details around query, source and destination dataset details etc.

+ +

> It would be great to see design doc and maybe some POC if possible. I’ve reopened the issue for you. +Thanks for re-opening the issue. We will add the design doc and POC to the issue.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-12 17:57:55
+
+

*Thread Reply:* I agree with @Maciej Obuchowski, a quick design doc followed by a POC would be great. +The integration could either live in OpenLineage or Trino but that can be discussed after the POC.

+ + + +
+ 👍 Alok +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-12 17:58:24
+
+

*Thread Reply:* (obviously, adding it to the trino repo would require aproval from the trino community)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mariusz Górski + (gorskimariusz13@gmail.com) +
+
2024-03-22 09:45:34
+
+

*Thread Reply:* Gentleman, we are also actively looking into this topic with the same repo from @takezoe as our base, I have submitted a PR to revive this project - it does work, the POC is there in a form of docker-compose.yaml deployment 🙂 some obvious things are missing for now (like kafka output instead of api) but I think it's a good starting point and it's compatible with latest trino and OL

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-26 15:54:02
+
+

*Thread Reply:* Thanks for put the foundation for the implementation. Base on it, I feel @Alok would still participate and make contribute to it. How about create a design doc and list all of the possible TBDs as @Julien Le Dem suggested.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-30 09:29:24
+
+

*Thread Reply:* Adding @takezoe to this thread. Thanks for your work on a Trino integration and welcome!

+ + + +
+ ❤️ Mariusz Górski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 09:11:57
+
+

*Thread Reply:* throwing the CFP for the Trino conference here in case any one of the contributors want to present there https://sessionize.com/trino-fest-2024

+
+
sessionize.com
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 09:12:43
+
+

*Thread Reply:* I'm also very happy to help with an idea for an abstract

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Alok + (a_prusty@apple.com) +
+
2024-04-02 12:26:51
+
+

*Thread Reply:* Hey Harel +Just FYI we are already engaged with Trino community to have a talk around Trino open lineage integration and have submitted an Abstract for review.

+ + + +
+ 🎉 Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 12:33:55
+
+

*Thread Reply:* once you release the integration, please add a reference about it to OpenLineage docs! +https://github.com/OpenLineage/docs

+
+ + + + + + + +
+
Website
+ <https://openlineage.io> +
+ +
+
Stars
+ 9 +
+ + + + + + + + +
+ + + +
+ 👍 Alok, Mariusz Górski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mariusz Górski + (gorskimariusz13@gmail.com) +
+
2024-04-02 12:51:54
+
+

*Thread Reply:* I think it's ready for review https://github.com/trinodb/trino/pull/21265 just with API sink integration, additional features can be added at @Alok's convenience as next PRs

+
+ + + + + + + +
+
Labels
+ docs +
+ +
+
Comments
+ 12 +
+ + + + + + + + + + +
+ + + +
+ 🎉 Michael Robinson +
+ +
+ 👍 Alok +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 11:12:43
+
+

Hey, there’s discrepancy between in Airflow. Docs say it completely blocks emitting OL events on operator class level. The actual behaviour is that it only blocks metadata extraction (so for instance it doesn’t call Snowflake DB for SnowflakeOperator). My question is what should be desired behaviour. Thoughts so far:

+ +
  1. current name indicates it should block emission (similar to disabled option)
  2. imo it doesn’t make sense to emit empty events with basic Airflow info only - from OL perspective it’s way more informative to attach inputs/outputs information +Thanks for any opinion!
  3. +
+ + + +
+ 👍 Maciej Obuchowski +
+ +
+ 🤔 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-08 11:14:07
+
+

*Thread Reply:* I believe we should not extract or emit any open lineage events if this option is used

+ + + +
+ ➕ Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 11:14:07
+
+

*Thread Reply:* I'm for option 2, don't send any event from task

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-08 11:44:54
+
+

*Thread Reply:* @Jakub Dardziński do you see any use case for not extracting metadata extraction but still emitting events?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 11:53:31
+
+

*Thread Reply:* The use case AFAIK was old SnowflakeOperator bug, we wanted to disable the collection there, since it zombified the task. The events being emitted still gave information about status of the task as well as non-dataset related metadata

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 11:53:38
+
+

*Thread Reply:* but I think it's less relevant now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 11:58:09
+
+

*Thread Reply:* ^ this and you might want to have information about task execution because OL is a backend for some task-tracking system

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-08 12:05:01
+
+

*Thread Reply:* Hm, I believe users don't expect us to spend time processing/extracting OL events if this configuration is used. It's the documented behaviour

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:09:39
+
+

*Thread Reply:* the question is if we should change docs or behaviour

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:09:45
+
+

*Thread Reply:* I believe the latter

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-08 12:46:39
+
+

*Thread Reply:* +1 behaviour

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-08 13:39:26
+
+

*Thread Reply:* +1

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-11 21:08:22
+
+

Hi, here's the in progress for Wednesday

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-11 22:06:19
+
+

*Thread Reply:* Looks like a great agenda! Left a couple of comments

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-11 22:06:43
+
+

*Thread Reply:* @Michael Robinson will you be able to facilitate or do you need help?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-12 05:57:39
+
+

*Thread Reply:* I'm also missing from the committer list, but can't comment on slides 🙂

+ + + +
+ 😱 Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:22:16
+
+

*Thread Reply:* Sorry about that @Kacper Muda. Gave you access just now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:22:37
+
+

*Thread Reply:* We probably need to add you to lists posted elsewhere... I'll check

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-12 11:22:52
+
+

*Thread Reply:* No worries, thanks 🙂 !

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 05:54:38
+
+

https://github.com/open-metadata/OpenMetadata/pull/15317 👀

+ + + +
+ 🔥 Jakub Dardziński, Harel Shein, Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:23:15
+
+

*Thread Reply:* this is awesome

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:26:48
+
+

*Thread Reply:* it looks like they use temporary deployments to test...

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 11:43:35
+
+

*Thread Reply:* yeah the GitHub history is wild

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:33:04
+
+

Hi, I'm at the conference hotel and my earbuds won't pair with my new mac for some reason. Does the agenda look good? Want to send out the reminders soon. I'll add the OpenMetadata news!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-12 11:42:00
+
+

*Thread Reply:* I think we can also add the Datahub PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-12 11:47:51
+
+

*Thread Reply:* @Paweł Leszczyński prefers to present only the circuit breakers

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:47:55
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 11:48:09
+
+

*Thread Reply:* This one?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-12 11:48:15
+
+

*Thread Reply:* yes!

+ + + +
+ 🔥 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 13:09:47
+
+

It's been a while since we've updated the twitter profile. Current description: "A standard api for collecting Data lineage and Metadata at runtime." What would you think of using our website's tagline: "An open framework for data lineage collection and analysis." Other ideas?

+ + + +
+ 👍 Maciej Obuchowski, Harel Shein, Julien Le Dem, Kacper Muda +
+ +
+ ✅ Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-13 12:34:32
+
+

can someone grant me write access to our forked sqlparser-rs repo?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-13 12:34:41
+
+

*Thread Reply:* @Julien Le Dem maybe?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-13 12:38:24
+
+

*Thread Reply:* I should probably add the committer group to it

+ + + +
+ ➕ Harel Shein, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-13 12:42:44
+
+

*Thread Reply:* I have made the committer group maintainer on this repo

+ + + +
+ 🙏 Harel Shein, Maciej Obuchowski +
+ +
+ ❤️ Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-13 17:19:20
+
+

https://github.com/OpenLineage/OpenLineage/pull/2514 +small but mighty 😉

+
+ + + + + + + +
+
Labels
+ ci, common +
+ + + + + + + + + + +
+ + + +
+ 🥳 Kacper Muda, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 11:52:15
+
+

Regarding the approved release, based on the additions it seems to me like we should make it a minor release (so 1.10.0). Any objections? Changes are here: https://github.com/OpenLineage/OpenLineage/compare/1.9.1...HEAD

+ + + +
+ ➕ Harel Shein, Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-14 14:16:09
+
+

We encountered a case of a START event, exceeding 2MB in Airflow. This was traced back to an operator with unusually long arguments and attributes. Further investigation revealed that our Airflow events contain redundant data across different facets, leading to unnecessary bloating of event sizes (those long attributes and args were attached three times to a single event). I proposed to remove some redundant facets and to refine the operator's attributes inclusion logic within AirflowRunFacet. I am not sure how breaking is this change, but some systems might depend on the current setup. Suggesting an immediate removal might not be the best approach, and i'd like to know your thoughts. (A similar problem exists within the Airflow provider.) +CC @Maciej Obuchowski @Willy Lulciuc @Jakub Dardziński

+ +

https://github.com/OpenLineage/OpenLineage/pull/2509

+
+ + + + + + + +
+
Labels
+ integration/airflow, extractor +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 15:06:14
+
+

As mentioned during yesterday's TSC, we can't get insight into DataHub's integration from the PR description in their repo. And it's a very big PR. Does anyone have any intel? PR is here: https://github.com/datahub-project/datahub/pull/9870

+
+ + + + + + + +
+
Labels
+ ingestion, product, devops +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 15:07:51
+
+

Changelog PR for 1.10 is RFR: https://github.com/OpenLineage/OpenLineage/pull/2516

+
+ + + + + + + +
+
Labels
+ documentation +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 16:20:59
+
+

@Julien Le Dem @Paweł Leszczyński Release is failing in the Java client job due to (I think) the version of spotless: +```Could not resolve com.diffplug.spotless:spotlessplugingradle:6.21.0. + Required by: + project : > com.diffplug.spotless:com.diffplug.spotless.gradle.plugin:6.21.0

+ +
+

No matching variant of com.diffplug.spotless:spotlessplugingradle:6.21.0 was found. The consumer was configured to find a library for use during runtime, compatible with Java 8, packaged as a jar, and its dependencies declared externally, as well as attribute 'org.gradle.plugin.api-version' with value '8.4'```

+
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-14 16:53:58
+
+

*Thread Reply:* @Michael Robinson https://github.com/OpenLineage/OpenLineage/pull/2517

+
+ + + + + + + +
+
Labels
+ client/java +
+ + + + + + + + + + +
+ + + +
+ ✅ Michael Robinson +
+ +
+ 🙌 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 18:17:25
+
+

fix to broken main: +https://github.com/OpenLineage/OpenLineage/pull/2518

+
+ + + + + + + +
+
Labels
+ integration/dagster +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 18:47:34
+
+

*Thread Reply:* Thanks, just tried again

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 18:48:38
+
+

*Thread Reply:* ? +it needs approve and merge 😛

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 18:50:52
+
+

*Thread Reply:* Oh oops disregard

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 18:50:57
+
+

*Thread Reply:* different PR

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 18:51:22
+
+

*Thread Reply:* 👍

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 19:01:47
+
+

There's an issue with the Flink job on CI: +** What went wrong: +Could not determine the dependencies of task ':shadowJar'. +&gt; Could not resolve all dependencies for configuration ':runtimeClasspath'. + &gt; Could not find io.**********************:**********************_sql_java:1.10.1. + Searched in the following locations: + - <https://repo.maven.apache.org/maven2/io/**********************/**********************-sql-java/1.10.1/**********************-sql-java-1.10.1.pom> + - <https://packages.confluent.io/maven/io/**********************/**********************-sql-java/1.10.1/**********************-sql-java-1.10.1.pom> + - file:/home/circleci/.m2/repository/io/**********************/**********************-sql-java/1.10.1/**********************-sql-java-1.10.1.pom + Required by: + project : &gt; project :shared + project : &gt; project :flink115 + project : &gt; project :flink117 + project : &gt; project :flink118

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-14 19:33:58
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2521

+
+ + + + + + + +
+
Labels
+ ci +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-14 19:34:11
+
+

*Thread Reply:* @Jakub Dardziński still awake? 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 19:35:42
+
+

*Thread Reply:* it’s just approval bot

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-14 19:38:11
+
+

*Thread Reply:* created issue on how to avoid those in the future https://github.com/OpenLineage/OpenLineage/issues/2522

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-14 19:39:05
+
+

*Thread Reply:* https://app.circleci.com/jobs/github/OpenLineage/OpenLineage/188526 I lack emojis on this server to fully express my emotions

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 19:39:49
+
+

*Thread Reply:* https://openlineage.slack.com/archives/C065PQ4TL8K/p1710454645059659 +you might have missed that

+
+ + +
+ + + } + + Jakub Dardziński + (https://openlineage.slack.com/team/U02S6F54MAB) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 19:40:22
+
+

*Thread Reply:* merge -> rebase -> problem gone

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-15 09:56:16
+
+

*Thread Reply:* PR to update the changelog is RFR @Jakub Dardziński @Maciej Obuchowski: https://github.com/OpenLineage/OpenLineage/pull/2526

+
+ + + + + + + +
+
Labels
+ documentation +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-14 19:24:46
+
+

https://github.com/OpenLineage/OpenLineage/pull/2520 +It’s a long-awaited PR - feel free to comment!

+
+ + + + + + + +
+
Labels
+ client/python +
+ + + + + + + + + + +
+ + + +
+ 🎉 Kacper Muda, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 20:48:21
+
+

OpenLineage is trending upward on OSSRank. Please vote!

+
+
oss-rank
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ✅ Jakub Dardziński, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-15 17:28:36
+
+

https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/ParentRunFacet.json#L20 +here the format is uuid +however if you follow logic for parent id in current dbt integration you might discover that parent run facet has assigned value of DAG’s run_id (which is not uuid)

+ +

@Julien Le Dem, what has higher priority? I think lots of people are using dbt-ol wrapper with current lineage_parent_id macro

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-19 10:31:30
+
+

*Thread Reply:* It is a uuid because it should be the id of an OL run

+ + + +
+ 👍 Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-18 12:21:37
+
+

where can I find who has write access to OL repo?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 12:29:00
+
+

*Thread Reply:* Settings > Collaborators and teams

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-18 12:33:41
+
+

*Thread Reply:* thanks Michael, seems like I don’t have enough permissions to see that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-19 10:31:57
+
+

Sorry, I have a dr appointment today and won’t join the meeting

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 10:32:24
+
+

*Thread Reply:* I gotta skip too. Maciej and Pawel are at the Kafka Summit

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 10:32:36
+
+

*Thread Reply:* I hope you’re fine!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-19 12:24:18
+
+

*Thread Reply:* I am fine thank you 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-19 12:24:20
+
+

*Thread Reply:* just a visit

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-19 10:35:57
+
+

Should we cancel the sync today?

+ + + +
+ 👍 Michael Robinson, Kacper Muda, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-20 10:02:53
+
+

looking at XTable today, any thoughts on how we can collaborate with them?

+
+
xtable.apache.org
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-20 10:09:30
+
+

*Thread Reply:* @Julien Le Dem @Willy Lulciuc this reminds me of some ideas we had a few years ago.. :)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-20 10:16:38
+
+

*Thread Reply:* hmm.. ok. maybe not that relevant for us, at first I thought this was an abstraction for read/write on top of Iceberg/Hudi/Delta.. but I think this is more of a data sync appliance. would still be relevant for linking together synced datasets (but I don't think it's that important now)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-20 13:21:26
+
+

*Thread Reply:* From the introduction https://www.confluent.io/blog/introducing-tableflow/, looks like they are using Flink for both data ingestion and compaction. It means we should at least consider to support hudi source and sink for flink lineage 🙂

+
+
Confluent
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-21 13:21:29
+
+

A key growth metric trending in the right direction:

+ + + + +
+ 🚀 Kacper Muda, Harel Shein, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-21 14:45:21
+
+

Eyes on this PR to add OpenMetadata to the Ecosystem page would be appreciated: https://github.com/OpenLineage/docs/pull/303. TIA! @Mariusz Górski

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ 🚀 Jakub Dardziński, Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-21 15:21:55
+
+

I really want to improve this page in the docs, anyone wants to work with me on that?

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-21 15:22:40
+
+

*Thread Reply:* perhaps also make this part of the PR process, so when we add support for something, we remember to update the docs

+ + + +
+ ➕ Willy Lulciuc, Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-21 15:22:55
+
+

*Thread Reply:* I free up next week and would love to chat… obviously, time permitting but the page needs some love ❤️

+ + + +
+ ❤️ Harel Shein, Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-21 15:24:40
+
+

*Thread Reply:* I can verify the information once you have some PR 🙂

+ + + +
+ 🙏 Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 12:53:38
+
+

RFR: a PR to add DataHub to the Ecosystem page https://github.com/OpenLineage/docs/pull/304

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 12:55:17
+
+

*Thread Reply:* The description comes from the very brief README in DataHub's GH repo and a glance at the code. No other documentation or resources appear to be available.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 12:58:43
+
+

*Thread Reply:* @Tamás Németh

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 15:57:42
+
+

Dagster is launching column-lineage support for dbt using the sqlglot parser https://github.com/dagster-io/dagster/pull/20407

+
+ + + + + + + +
+
Comments
+ 4 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-22 17:17:03
+
+

*Thread Reply:* I kinda like their approach to use post-hooks in order to enable column-level lineage so that custom macro collects information about columns, logs it and they parse the log after the execution

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-22 17:17:32
+
+

*Thread Reply:* it doesn’t force dbt docs generate step that some might not want to use

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-22 17:17:57
+
+

*Thread Reply:* but at the same time reuses DBT adapter to make additional calls to retrieve missing metadata

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-23 14:32:29
+
+

@Paweł Leszczyński interesting project I came across over the weekend: https://github.com/HamaWhiteGG/flink-sql-lineage

+
+ + + + + + + +
+
Stars
+ 323 +
+ +
+
Language
+ Java +
+ + + + + + + + +
+ + + +
+ 👍 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-25 03:16:45
+
+

*Thread Reply:* Wow, this is something we would love to have (flink SQL support). It's great to know that people around the globe are working on the same thing and heading same direction. Great finding @Willy Lulciuc. Thanks for sharing!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 06:57:01
+
+

*Thread Reply:* On Kafka Summit I've talked with Timo Walther from Flink SQL team and he proposed alternative approach.

+ +

Flink SQL has stable (across releases) CompiledPlan JSON text representation that could be parsed, and has all the necessary info - as this is used for serializing actual execution plan both ways.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-29 19:43:38
+
+

*Thread Reply:* As Flink SQL will convert to transformations before execution, technical speaking our existing solution has already be able to create linage info for Flink SQL apps (not including column lineage and table schemas (that can be inferred within flink table environment)). I will create Flink SQL job for e2e testing purpose.

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-29 19:45:49
+
+

*Thread Reply:* I am also working on Flink side for table lineage. Hopefully, new lineage features can be released in flink 1.20.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-25 09:58:57
+
+

Sessions for this year's Data+AI Summit have been published. A search didn't turn up anything related to lineage, but did you know Julien and Willy's talk at last year's summit has received 4k+ views? 👀

+
+
databricks.com
+ + + + + + + + + + + + + + + +
+
+
YouTube
+ +
+ + + } + + Databricks + (https://www.youtube.com/@Databricks) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-25 10:16:40
+
+

*Thread Reply:* seems like our talk was not accepted, but I can see 9 sessions on unity catalog 😕

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 05:59:45
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 05:59:55
+
+

finally merged 🙂

+ + + +
+ 🎉 Harel Shein, Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-26 06:00:18
+
+

pawel-big-lebowski commented on Nov 21, 2023 +whoa

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-26 07:27:22
+
+

I’ll miss the sync today (on the way to data council)

+ + + +
+ 🔥 Paweł Leszczyński, Maciej Obuchowski, Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-26 12:06:44
+
+

*Thread Reply:* Same

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 13:59:03
+
+

*Thread Reply:* have fun at the conference!

+ + + +
+ ❤️ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 13:23:08
+
+

OK @Maciej Obuchowski - 1 job has many stages; 1 stage has many tasks. Transitively, this means that 1 job has many tasks.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 13:58:44
+
+

*Thread Reply:* batch or streaming one? 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 14:01:36
+
+

*Thread Reply:* Doesn't matter. It's the same concept.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 13:27:58
+
+

Also @Paweł Leszczyński, seem Spark metrics has this:

+ +

local-1711474020860.driver.LiveListenerBus.listenerProcessingTime.io.openlineage.spark.agent.OpenLineageSparkListener + count = 12 + mean rate = 1.19 calls/second + 1-minute rate = 1.03 calls/second + 5-minute rate = 1.01 calls/second + 15-minute rate = 1.00 calls/second + min = 0.00 milliseconds + max = 1985.48 milliseconds + mean = 226.81 milliseconds + stddev = 549.12 milliseconds + median = 4.93 milliseconds + 75% &lt;= 53.64 milliseconds + 95% &lt;= 1985.48 milliseconds + 98% &lt;= 1985.48 milliseconds + 99% &lt;= 1985.48 milliseconds + 99.9% &lt;= 1985.48 milliseconds

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-27 09:23:49
+
+

Do you think Bipan's team could potentially benefit significantly from upgrading to the latest version of openlineage-spark? https://openlineage.slack.com/archives/C01CK9T7HKR/p1711483070147019

+
+ + +
+ + + } + + Bipan Sihra + (https://openlineage.slack.com/team/U06RFHBSTHR) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-27 09:55:04
+
+

*Thread Reply:* @Paweł Leszczyński wdyt?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-27 10:00:01
+
+

*Thread Reply:* I think the issue here is that marquez is not able to properly visualize parent run events that Maciej has added recently for a Spark application

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-27 10:03:22
+
+

*Thread Reply:* So if they downgraded would they have a graph closer to what they want?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-27 11:23:31
+
+

*Thread Reply:* I don't see parent run events there?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-27 09:54:54
+
+

I'm exploring ways to improve the demo gif in the Marquez README. An improved and up-to-date demo gif could also be used elsewhere -- in the Marquez landing pages, for example, and the OL docs. Along with other improvements to the landing pages, I created a new gif that's up to date and higher-resolution, but it's large (~20 MB). +• We could put it on YouTube and link to it, but that would downgrade the user experience in other ways. +• We could host it somewhere else, but that would mean adding another tool to the stack and, depending on file size limits, could cost money. (I can't imagine it would cost but I haven't really looked into this option yet. Regardless of cost, tt seems to have the same drawbacks as YT from a UX perspective.) +• We could have GitHub host it in another repo (for free) in the Marquez or OL orgs. + ◦ It could go in the OL Docs because it's likely we'll want to use it in the docs anyway, but even if we never serve it wouldn't this create issues for local development at a minimum? I opened a PR to do this, which a PR with other improvements is waiting on, but not sure about this approach. + ◦ It could go in the unused Marquez website repo, but there's a good chance we'll forget it's there and remove or archive the repo without moving it first. + ◦ In another repo, or even a new one for stuff like this? +Anyone have an opinion or know of a better option?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-28 10:52:31
+
+

*Thread Reply:* maybe make it a HTML5 video?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-28 10:55:32
+
+

*Thread Reply:* https://wp-rocket.me/blog/replacing-animated-gifs-with-html5-video-for-faster-page-speed/

+
+
WP Rocket
+ + + + + + +
+
Written by
+ Raelene Morey +
+ +
+
Est. reading time
+ 9 minutes +
+ + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 10:51:18
+
+

*Thread Reply:* 👀

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-28 10:52:04
+
+

@Julien Le Dem @Harel Shein how did Data Council panel and talk go?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-28 10:53:30
+
+

*Thread Reply:* Was just composing the message below :)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-28 10:53:05
+
+

Some great discussions here at data council, the panel was really great and we can definitely feel energy around OpenLineage continuing to build up! 🚀 +Thanks @Julien Le Dem for organizing and shoutout to @Ernie Ostic @Sheeri Cabral (Collibra) @Eric Veleker for taking the time and coming down here and keeping pushing more and building the community! ❤️

+ + + +
+ 🏄‍♂️ Michael Robinson, Maciej Obuchowski +
+ +
+ 👍 Ernie Ostic +
+ +
+ 🎉 tati +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 11:08:13
+
+

*Thread Reply:* @Harel Shein did anyone take pictures?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-29 11:10:54
+
+

*Thread Reply:* there should be plenty of pictures from the conference organizers, we'll ask for some

+ + + +
+ 🙌 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 11:15:03
+
+

*Thread Reply:* Did a search and didn't see anything

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-29 11:16:59
+
+

*Thread Reply:* here's one

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-29 11:17:07
+
+

*Thread Reply:* Speaker dinner the night before: https://www.linkedin.com/posts/datacouncil-aidatacouncil-ugcPost-7178852429705224193-De46?utmsource=share&utmmedium=memberios|https://www.linkedin.com/posts/datacouncil-aidatacouncil-ugcPost-7178852429705224193-De46?utmsource=share&utmmedium=memberios

+
+
linkedin.com
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-29 11:17:19
+
+

*Thread Reply:* Ahah. Same picture

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-29 11:17:51
+
+

*Thread Reply:* haha. Julien and Ernie look great while I'm explaining how to land an airplane 🛬

+ + + +
+ 😊 Ernie Ostic +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 11:44:29
+
+

*Thread Reply:* Great pic!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-30 20:40:06
+
+

*Thread Reply:* The photo gallery is there

+
+
Pixieset
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-03-30 20:47:50
+
+

*Thread Reply:*

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 09:01:34
+
+

*Thread Reply:* awesome! just in time for the newsletter 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Eric Veleker + (eric@atlan.com) +
+
2024-04-05 22:29:56
+
+

*Thread Reply:* Thank you for thinking of us. Onwards and upwards.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-28 15:53:09
+
+

I just find the naming conventions for hive/iceberg/hudi are not listed in the doc https://openlineage.io/docs/spec/naming/. Shall we further standardize them? Any suggestions?

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-28 16:22:22
+
+

*Thread Reply:* Yes. This also came up in a conversation with one of the maintainers of dbt-core, we can also pick up on a proposal to extend the naming conventions markdown to something a bit more scalable.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-28 16:23:29
+
+

*Thread Reply:* What you think about this proposal? +https://github.com/OpenLineage/OpenLineage/pull/1702

+
+ + + + + + + +
+
Labels
+ documentation, proposal +
+ +
+
Comments
+ 3 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-03-28 16:52:33
+
+

*Thread Reply:* Thanks for sharing the info. Will take a deeper look later today.

+ + + +
+ 👍 Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mariusz Górski + (gorskimariusz13@gmail.com) +
+
2024-03-29 02:14:19
+
+

*Thread Reply:* I think this is similar topic to resource naming in ODD, might be worth to take a look for inspiration: https://github.com/opendatadiscovery/oddrn-generator

+
+ + + + + + + +
+
Stars
+ 4 +
+ +
+
Language
+ Python +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 05:42:19
+
+

*Thread Reply:* the thing is we need to have language-agnostic way of defining those naming conventions and be able to generate code for them, similar to facets spec

+ + + +
+ 👍 Mariusz Górski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mariusz Górski + (gorskimariusz13@gmail.com) +
+
2024-03-29 08:10:22
+
+

*Thread Reply:* could be also an idea to have micro rest api embedded in each client, so managing naming convention would be stored there and each client (python/java) could run it as a subprocess 🤔

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-01 12:44:15
+
+

*Thread Reply:* we can also just write it in Rust, @Maciej Obuchowski 😁

+ + + +
+ 👍 Mariusz Górski +
+ +
+ 😅 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-01 13:11:01
+
+

*Thread Reply:* no real changes/additions, but starting to organize the doc for now: https://github.com/OpenLineage/OpenLineage/pull/2554

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-29 11:14:25
+
+

@Maciej Obuchowski we also heard some good things about the sqlglot parser. have you looked at it recently?

+
+ + + + + + + +
+
Website
+ <https://sqlglot.com/> +
+ +
+
Stars
+ 5285 +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 12:59:29
+
+

*Thread Reply:* I love the fact that our parser is in type safe language :)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 14:14:46
+
+

*Thread Reply:* does it matter after all when it comes to parsing SQL? +it might be worth to run some comparisons but it may turn out that sqlglot misses most of Snowflake dialect that we currently support

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 16:56:04
+
+

*Thread Reply:* We'd miss on Java side parsing as well

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 16:57:37
+
+

*Thread Reply:* very importantly this ^

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-03-29 17:23:36
+
+

*Thread Reply:* That’s important. Yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 10:05:24
+
+

OpenLineage 1.11.0 release vote is now open: https://openlineage.slack.com/archives/C01CK9T7HKR/p1711980285409389

+
+ + +
+ + + } + + Michael Robinson + (https://openlineage.slack.com/team/U02LXF3HUN7) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-02 11:29:23
+
+

Sorry, I’ll be late to the sync

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 12:56:31
+
+

forgot to mention, but we have the TSC meeting coming up next week. we should start sourcing topics

+ + + +
+ 👍 Michael Robinson, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 15:58:23
+
+

*Thread Reply:* 1.10 and 1.11 releases +Data Council, Kafka Summit, & Boston meetup shout outs and quick recaps +Datadog poc update or demo?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 16:04:39
+
+

*Thread Reply:* Discussion item about Trino integration next steps?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 16:06:16
+
+

*Thread Reply:* Accenture+Confluent roundtable reminder for sure

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 16:24:17
+
+

*Thread Reply:* job to job dependencies discussion item? https://openlineage.slack.com/archives/C065PQ4TL8K/p1712153842519719

+
+ + +
+ + + } + + Julian LaNeve + (https://openlineage.slack.com/team/U0544QC1DS9) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ➕ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 16:43:56
+
+

*Thread Reply:* I think it's too early for Datadog update tbh, but I like the job to job discussion. +We can make also bring up the naming library discussion that we talked about yesterday

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 21:19:03
+
+

one more thing, if we want we could also apply for a free Datadog account for OpenLineage and Marquez: https://www.datadoghq.com/partner/open-source/

+
+
Datadog
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 05:27:02
+
+

*Thread Reply:* would be nice for tests

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julian LaNeve + (lanevejulian@gmail.com) +
+
2024-04-03 10:17:22
+
+

is there any notion of process dependencies in openlineage? i.e. if I have two airflow tasks that depend on each other, with no dataset in between, can I express that in the openlineage spec?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 11:39:25
+
+

*Thread Reply:* AFAIK no, it doesn't aim to do reflect that +cc @Julien Le Dem

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 11:42:55
+
+

*Thread Reply:* It is not in the core spec but this could be represented as a job facet. It is probably in the airflow facet right now but we could add a more generic job dependency facet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:52:36
+
+

*Thread Reply:* we do represent hierarchy though - with ParentRunFacet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:57:16
+
+

*Thread Reply:* if we were to add some dependency facet, what would we want to model?

+ +
  1. we want to note the dependency between jobs, not between particular runs, so +a. we are in job X and want to note that job Y will run after it ends +b. we are in job Y and want to note that it ran because it depended on successful run of job X
  2. we want also to note the dependency between particular runs: +a. we are in run x of job X, and want to note that run y of job Y will happen after it ends +b. we are in run y of job Y, and want to note that it depended (as in - ran because the preceding job(s) finished successfully) on run x of job X
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:59:47
+
+

*Thread Reply:* do we also want to model something like Airflow's trigger rules? https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#trigger-rules

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 12:46:01
+
+

*Thread Reply:* I don't think this is about hierarchy though, right? If I understand @Julian LaNeve correctly, I think it's more #2

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julian LaNeve + (lanevejulian@gmail.com) +
+
2024-04-03 12:48:25
+
+

*Thread Reply:* yeah it's less about hierarchy - definitely more about #2.

+ +

assume we have a DAG that looks like this: +Task A -&gt; Task B -&gt; Task C +today, OL can capture the full set of dependencies this if we do: +A -&gt; (dataset 1) -&gt; B -&gt; (ds 2) -&gt; C +but it's not always the case that you have datasets between everything. my question was moreso around "how can I use OL to capture the relationship between jobs if there are no datasets in between"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 12:52:32
+
+

*Thread Reply:* I had opened an issue to track this a while ago but we did not get too far in the discussion: https://github.com/OpenLineage/OpenLineage/issues/552

+
+ + + + + + + +
+
Labels
+ enhancement +
+ +
+
Comments
+ 2 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julian LaNeve + (lanevejulian@gmail.com) +
+
2024-04-03 12:53:19
+
+

*Thread Reply:* oh nice - unsurprisingly you were 2 years ahead of me 😆

+ + + +
+ 😅 Julien Le Dem +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 12:53:57
+
+

*Thread Reply:* You can track the dependency both at the job level and at the run level. +At the job level you would do something along the lines of: +job: { facets: { + job_dependencies: { + predecessors: [ + { namespace: , name: }, ... + ], + successors: [ + { namespace: , name: }, ... + ] + } +}}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 12:56:57
+
+

*Thread Reply:* At the run level you could track the actual task run dependencies: +run: { facets: { + run_dependencies: { + predecessor: [ "{run uuid}", ...], + successors: [...], + } +}}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 13:00:42
+
+

*Thread Reply:* I think the current airflow run facet contains that information in an airflow specific representation: https://github.com/apache/airflow/blob/main/airflow/providers/openlineage/plugins/facets.py

+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 13:02:16
+
+

*Thread Reply:* I think we should have the discussion in the ticket so that it does not get lost in the slack history

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 13:21:13
+
+

*Thread Reply:* run: { facets: { + run_dependencies: { + predecessor: [ "{run uuid}", ...], + successors: [...], + } +}} +I like this format, but would have full run/job identifier as ParentRunFacet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 13:23:05
+
+

*Thread Reply:* For the trigger rules I wonder if this is too specific to airflow.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Julien Le Dem + (julien@apache.org) +
+
2024-04-03 13:23:27
+
+

*Thread Reply:* But if there’s a generic way to capture this, it makes sense

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 11:13:23
+
+

Don't forget to register for this! https://events.confluent.io/roundtable-data-lineage/Accenture

+
+
events.confluent.io
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Maciej Obuchowski +
+ +
+ 👍 Harel Shein, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 17:24:31
+
+

This attempt at a SQLAlchemy was basically working, if not perfectly, the last time I played with it: https://github.com/OpenLineage/OpenLineage/pull/2088. What more do I need to do to get it to the point where it can be merged as an "experimental"/"we warned you" integration? I mean, other than make sure it's still working and clean it up? 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 06:46:06
+
+

https://docs.getdbt.com/docs/collaborate/column-level-lineage#sql-parsing

+
+
docs.getdbt.com
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 06:48:15
+
+

*Thread Reply:* seems like it’s only for dbt cloud

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:20:56
+
+

*Thread Reply:* > Column-level lineage relies on SQL parsing. +Was thinking about doing the same thing at some point

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:21:12
+
+

*Thread Reply:* Basically with dbt we know schemas, so we also can resolve wildcards as well

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:22:01
+
+

*Thread Reply:* but that requires adding capability for providing known schema into sqlparser

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:23:15
+
+

*Thread Reply:* that's not very hard to add afaik 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:23:27
+
+

*Thread Reply:* not exactly into sqlparser too

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:23:32
+
+

*Thread Reply:* just our parser

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:23:46
+
+

*Thread Reply:* yeah, our parser

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:23:55
+
+

*Thread Reply:* still someone has to add it :D

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:24:04
+
+

*Thread Reply:* some rust enthusiast probably

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:24:14
+
+

*Thread Reply:* 👀

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:27:56
+
+

*Thread Reply:* but also: dbt provides schema info only if you generate catalog.json with generate docs command

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-04 07:36:13
+
+

*Thread Reply:* Right now we have the dbl-ol wrapper anyway, so we can make another dbt docs command on behalf of the user too

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:39:17
+
+

*Thread Reply:* not sure if running commands on behalf of user is good idea, but denoting in docs that running it increases accuracy of column-level lineage is probably a good idea

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:39:22
+
+

*Thread Reply:* once we build it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 07:39:24
+
+

*Thread Reply:* of course

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-04 07:42:41
+
+

*Thread Reply:* That depends, what are the side effects of running dbt docs?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:49:19
+
+

*Thread Reply:* the other option is similar to dagster's approach - run post-hook macro that prints schema to logs and read the logs with dbt-ol wrapper

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 07:49:40
+
+

*Thread Reply:* which again won't work in dbt cloud - there catalog.json seems like the only option

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 08:06:26
+
+

*Thread Reply:* > That depends, what are the side effects of running dbt docs? +refreshing someone's documentation? 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 09:51:44
+
+

*Thread Reply:* it would be configurable imho, if someone doesn’t want column level lineage in price of additional step, it’s their choice

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-04 11:57:44
+
+

*Thread Reply:* yup, agreed. I'm sure we can also run dbt docs to a temp directory that we'll delete right after

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 14:29:06
+
+

Some encouraging stats from Sonatype: these are Spark integration downloads (unique IPs) over the last 12 months

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 14:30:45
+
+

*Thread Reply:* That's an increase of 17560.5%

+ + + +
+ 🎉 Harel Shein, Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:43:27
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/releases/tag/1.11.3 +that’s a lot of notes 😮

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 15:15:42
+
+

Marquez committers: there's a committer vote open 👀

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-04 15:22:19
+
+

did anyone submit a CFP here? https://sessionize.com/open-source-summit-europe-2024/ +it's a linux foundation conference too

+
+
sessionize.com
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 10:57:08
+
+

*Thread Reply:* looks like a nice conference

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:29:43
+
+

*Thread Reply:* too far for me, but might be a train ride for you?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:37:19
+
+

*Thread Reply:* yeah, I might submit something 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:37:45
+
+

*Thread Reply:* and I think there are actually direct trains to Vienna from Warsaw

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 04:53:56
+
+

Hmm @Maciej Obuchowski @Paweł Leszczyński - I see we released 1.11.3, but I don't see the artifacts in central. Are the artifacts blocked?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 04:54:03
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-05 04:54:42
+
+

*Thread Reply:* after last release, it took me some 24h to see openlineage-flink artifact published

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 04:56:02
+
+

*Thread Reply:* I recall something about the artifacts had to be manually published from the staging area.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 05:11:40
+
+

*Thread Reply:* @Maciej Obuchowski - can you check if the release is stuck in staging?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 05:11:53
+
+

*Thread Reply:* I recall last time it failed because there wasn't a javadoc associated with it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 05:23:25
+
+

*Thread Reply:* Nevermind @Paweł Leszczyński @Maciej Obuchowski - it seems like the search indexes haven't been updated.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-05 05:23:36
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 06:01:51
+
+

*Thread Reply:* @Michael Robinson has to manually promote them but it's not instantaneous I believe

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:23:51
+
+

I'm seeing some really strange behavior with OL Spark, I'm going to give some data to help out, but these are still breadcrumbs unfortunately. 🧵

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:25:06
+
+

*Thread Reply:* the driver for this job is running for more than 5 hours, but the job actually finished after 20 minutes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:25:08
+
+

*Thread Reply:*

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:25:56
+
+

*Thread Reply:* most the cpu time in those 5 hours are spent in openlineage methods

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:25:59
+
+

*Thread Reply:*

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:26:36
+
+

*Thread Reply:* it's also not reproducible 😕

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 09:26:46
+
+

*Thread Reply:* but happens "sometimes"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 09:55:37
+
+

*Thread Reply:* DatasetIdentifier.equals?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 09:55:58
+
+

*Thread Reply:* can you check what calls it?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:10:46
+
+

*Thread Reply:* unfortunately, some of the stack frames are truncated by JVM

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:14:08
+
+

*Thread Reply:*

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:17:03
+
+

*Thread Reply:* top methods by time:

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:18:27
+
+

*Thread Reply:* maybe this has something to do with SymLink and the lombok implementation of .equals() ?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:19:31
+
+

*Thread Reply:* and then some sort of circular dependency

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:19:32
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:19:46
+
+

*Thread Reply:* but is this a JDBC job?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:20:01
+
+

*Thread Reply:* let me see

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:20:08
+
+

*Thread Reply:* I don't think so

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:23:34
+
+

*Thread Reply:* it's not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:23:52
+
+

*Thread Reply:* ok, we don't use lang3 Pair a lot - it has to be in ColumnLevelLineageBuilder 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 11:30:08
+
+

*Thread Reply:* yes.. I'm staring at that class for a while now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:33:43
+
+

*Thread Reply:* what's the rough size of the logical plan of the job?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:33:59
+
+

*Thread Reply:* I'm trying to understand whether we're looking at some infinite loop

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:34:06
+
+

*Thread Reply:* or just something done very ineffiently

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:35:07
+
+

*Thread Reply:* like every input being added in this manner: +``` public void addInput(ExprId exprId, DatasetIdentifier datasetIdentifier, String attributeName) { + inputs.computeIfAbsent(exprId, k -> new LinkedList<>());

+ +
Pair&lt;DatasetIdentifier, String&gt; input = Pair.of(datasetIdentifier, attributeName);
+
+if (!inputs.get(exprId).contains(input)) {
+  inputs.get(exprId).add(input);
+}
+
+ +

}`` +it's a candidate: it has to traverse the list returned frominputs` for every CLL dependency field added

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:35:59
+
+

*Thread Reply:* it looks like we're building size N list in N^2 time: +inputs.stream() + .filter(i -&gt; i instanceof InputDatasetFieldWithIdentifier) + .map(i -&gt; (InputDatasetFieldWithIdentifier) i) + .forEach( + i -&gt; + context + .getBuilder() + .addInput( + ExprId.apply(i.exprId().exprId()), + new DatasetIdentifier( + i.datasetIdentifier().getNamespace(), i.datasetIdentifier().getName()), + i.field())); +🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:39:42
+
+

*Thread Reply:* ah, this isn't even used now since it's for new extension-based spark collection

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:39:58
+
+

*Thread Reply:* @Paweł Leszczyński this is most likely a future bug ⬆️

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:42:13
+
+

*Thread Reply:* I think we're still doing it now anyway: +``` private static void extractInternalInputs( + LogicalPlan node, + ColumnLevelLineageBuilder builder, + List datasetIdentifiers) {

+ +
datasetIdentifiers.stream()
+    .forEach(
+        di -> {
+          ScalaConversionUtils.fromSeq(node.output()).stream()
+              .filter(attr -> attr instanceof AttributeReference)
+              .map(attr -> (AttributeReference) attr)
+              .collect(Collectors.toList())
+              .forEach(attr -> builder.addInput(attr.exprId(), di, attr.name()));
+        });
+
+ +

}```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 11:53:44
+
+

*Thread Reply:* and that's linked list - must be pretty slow jumping all those pointers

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:01:48
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:12:22
+
+

*Thread Reply:* There are some more funny places in CLL code, like we're iterating over list of schema fields and calling some function with name of that field : +schema.getFields().stream() + .map(field -&gt; Pair.of(field, getInputsUsedFor(field.getName()))) +then immediately iterate over it second time to get the field back from it's name: +List&lt;Pair&lt;DatasetIdentifier, String&gt;&gt; getInputsUsedFor(String outputName) { + Optional&lt;OpenLineage.SchemaDatasetFacetFields&gt; outputField = + schema.getFields().stream() + .filter(field -&gt; field.getName().equalsIgnoreCase(outputName)) + .findAny();

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 12:51:15
+
+

*Thread Reply:* I think the time spent by the driver (5 hours) just on these methods smells like an infinite loop?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 12:51:59
+
+

*Thread Reply:* like, as inefficient as it may be, this is a lot of time

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:52:20
+
+

*Thread Reply:* did it finish eventually?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 12:52:51
+
+

*Thread Reply:* yes... but.. I wonder if something killed it somewhere?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:52:58
+
+

*Thread Reply:* I mean, it can be something like 10000^3 loop 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 12:53:01
+
+

*Thread Reply:* I couldn't find anything in the logs to indicate

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:53:10
+
+

*Thread Reply:* and it has to do those pair comparisons

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:54:12
+
+

*Thread Reply:* would be easier if we could see the general size of a plan of this job - if it's something really small then I'm probably wrong

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:54:37
+
+

*Thread Reply:* but if there are 1000s of columns... anything can happen 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 12:55:09
+
+

*Thread Reply:* yeah.. trying to find out. I don't have that facet enabled there, and I can't find the ol events in the logs (it's writing to console, and I think they got dropped)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:58:32
+
+

*Thread Reply:* DevNullTransport 🙂

+ + + +
+ 😅 Harel Shein, Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:04:37
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:08:09
+
+

*Thread Reply:* generally speaking, we have a similar problem here like we had with Airflow integration

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:08:35
+
+

*Thread Reply:* we are not holding up the job per se, but... we are holding up the spark application

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:09:07
+
+

*Thread Reply:* do we have a way to be defensive about that somehow, shutdown hook from spark to our thread or something

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:10:26
+
+

*Thread Reply:* there's no magic

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:10:44
+
+

*Thread Reply:* circuit breaker with timeout does not work?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:12:03
+
+

*Thread Reply:* it would, but we don't turn that on by default

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:12:18
+
+

*Thread Reply:* also, if we do, what should be our default values?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:14:08
+
+

*Thread Reply:* what would not hurt you if you enabled it, 30 seconds?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:14:23
+
+

*Thread Reply:* I guess we should aim much lower with the runtime

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:21:41
+
+

*Thread Reply:* yeah, and make sure we emit metrics / logs when that happens

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:27:31
+
+

*Thread Reply:* wait, our circuit breaker right now only supports cpu & memory

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:27:39
+
+

*Thread Reply:* we would need to add a timeout one, right?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:38:27
+
+

*Thread Reply:* ah, yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:38:41
+
+

*Thread Reply:* we've talked about it but it's not implemented yet https://github.com/OpenLineage/OpenLineage/blob/3dad978a3a76ea9bb709334f1526086f95[…]o/openlineage/client/circuitBreaker/ExecutorCircuitBreaker.java

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 13:41:32
+
+

*Thread Reply:* and BTW, no abnormal CPU or memory usage?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:44:13
+
+

*Thread Reply:* nope, not at all

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:51:02
+
+

*Thread Reply:* green line is when spark job actually finishes, but the graph is the whole runtime of the driver

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-05 13:51:33
+
+

*Thread Reply:*

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 14:16:19
+
+

*Thread Reply:* I mean, it's using 100% of one core 🙂

+ + + +
+ 🙃 Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 02:05:06
+
+

*Thread Reply:* it's similar to what aniruth experienced. there's something that for some type of logical plans causes recursion alike behaviour. However, I don't think it's recursion bcz it's ending at some point. If we had DebugFacet we would be able to know which logical plan nodes are involved in this.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-08 10:09:17
+
+

*Thread Reply:* I'll try to get that for us

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 13:23:34
+
+

*Thread Reply:* > If we had DebugFacet we would be able to know which logical plan nodes are involved in this. +if the event would not take 1GB 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 13:24:16
+
+

*Thread Reply:* > it's similar to what aniruth experienced. there's something that for some type of logical plans causes recursion alike behaviour. However, I don't think it's recursion bcz it's ending at some point. If we had DebugFacet we would be able to know which logical plan nodes are involved in this. (edited) +what about my thesis that something is just extremely slow in column-level lineage code?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-05 16:25:37
+
+

Some adoption metrics from Sonatype and PyPI, visualized using Preset. In Preset, you can see the number for each month (but we're out of seats on the free tier there). The big number is the downloads for the last month (February in most cases).

+ + + + + + + + +
+ 🔥 Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 09:51:27
+
+

Good news. @Paweł Leszczyński - the memory leak fixes worked. Our streaming pipelines have run through the weekend without a single OOM crash.

+ + + +
+ 🎉 Harel Shein, Peter Huang, Jakub Dardziński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-08 10:23:11
+
+

*Thread Reply:* @Damien Hawes Would you please point me the PR that fixes the issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 10:25:14
+
+

*Thread Reply:* This was the issue: https://github.com/OpenLineage/OpenLineage/issues/2561

+ +

There were two PRs:

+ +
  1. JobMetricsHolder: https://github.com/OpenLineage/OpenLineage/pull/2565
  2. UnknownEntryFacetListener: https://github.com/OpenLineage/OpenLineage/pull/2557
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 10:31:15
+
+

*Thread Reply:* @Peter Huang ^

+ + + +
+ :gratitude_thank_you: Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 12:41:47
+
+

*Thread Reply:* @Damien Hawes any other feedback for OL with streaming pipelines you have so far?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 12:42:20
+
+

*Thread Reply:* It generates a TON of data

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 12:44:19
+
+

*Thread Reply:* There are some optimisations that could be made:

+ +
  1. A lot of the facets can be cached, and don't need to be recreated every time. +The connector (obviously) doesn't care about the size of the data that is being processed, rather it cares about how frequent the spark events are. Spark's micro-batching thing means that the job start -> stage submitted -> task started -> task ended -> stage complete -> job end cycle fires more frequently.
  2. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 12:46:10
+
+

*Thread Reply:* This has an impact on any backend using it, as the run id keeps changing. This means the parent suddenly has thousands of jobs as children.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-08 12:46:29
+
+

*Thread Reply:* Our biggest pipeline generates a new event cycle every 2 minutes.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 12:57:23
+
+

*Thread Reply:* "Too much data" is exactly what I thought 🙂 +The obvious potential issue with caching is the same issue we just fixed... potential memory leaks, and cache invalidation

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 12:58:35
+
+

*Thread Reply:* > the run id keeps changing +In this case, that's a bug. We'd still need some wrapping event for whole streaming job though, probably other than application start

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 13:19:19
+
+

*Thread Reply:* on the other topic, did those problems stop? https://github.com/OpenLineage/OpenLineage/issues/2513 +with https://github.com/OpenLineage/OpenLineage/pull/2535/files

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-08 11:05:44
+
+

when talking about the naming scheme for datasets, would everyone here agree that we generally use: {scheme}://{authority}/{unique_name} ? where generally authority == namespace

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-08 11:08:20
+
+

*Thread Reply:* I think so, and if we don’t then we should

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-08 11:10:31
+
+

*Thread Reply:* ~which brings me to the question why construct dataset name as such~ nvm

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-08 11:10:36
+
+

*Thread Reply:* please feel free to chime in here too https://github.com/dbt-labs/dbt-core/issues/8725

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-08 12:42:18
+
+

*Thread Reply:* > where generally authority == namespace (edited) +{scheme}://{authority} is namespace

+ + + +
+ 👍 Jakub Dardziński, Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-08 14:13:01
+
+

*Thread Reply:* agreed

+ + + +
+
+
+
+ + +
diff --git a/channel/general/index.html b/channel/general/index.html index e13edbd..19de1b5 100644 --- a/channel/general/index.html +++ b/channel/general/index.html @@ -7307,11 +7307,15 @@

Group Direct Messages

Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push?

@@ -8043,7 +8047,7 @@

Group Direct Messages

*Thread Reply:*

- + @@ -8078,7 +8082,7 @@

Group Direct Messages

*Thread Reply:*

- + @@ -8958,11 +8962,15 @@

Supress success

*Thread Reply:*

@@ -9685,7 +9693,7 @@

Supress success

*Thread Reply:*

- + @@ -11863,11 +11871,15 @@

Supress success

Build on main passed (edited)

@@ -12784,6 +12796,43 @@

Supress success

+ +
+
+ + + + +
+ +
Luke Smith + (luke.smith@kinandcarta.com) +
+
2021-08-20 13:57:41
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -12856,11 +12905,15 @@

Supress success

I added this configuration to my cluster :

@@ -12891,11 +12944,15 @@

Supress success

I receive this error message:

@@ -13097,11 +13154,15 @@

Supress success

*Thread Reply:*

@@ -13251,11 +13312,15 @@

Supress success

Now I have this:

@@ -13416,11 +13481,15 @@

Supress success

*Thread Reply:* Hi , @Luke Smith, thank you for your help, are you familiar with this error in azure databricks when you use OL?

@@ -13451,11 +13520,15 @@

Supress success

*Thread Reply:*

@@ -13508,11 +13581,15 @@

Supress success

@@ -17922,11 +17999,15 @@

Supress success

*Thread Reply:* Successfully got a basic prefect flow working

@@ -22372,29 +22453,41 @@

Supress success

I also see exceptions in Marquez logs

@@ -22847,11 +22940,15 @@

Supress success

Hey there, I’m not sure why I’m getting below error, after I ran OPENLINEAGE_URL=<http://localhost:5000> dbt-ol run , although running this command dbt debug doesn’t show any error. Pls help.

@@ -23166,20 +23263,28 @@

Supress success

*Thread Reply:* Actually i had to use venv that fixed above issue. However, i ran into another problem which is no jobs / datasets found in marquez:

@@ -23422,11 +23527,15 @@

Supress success

*Thread Reply:*

@@ -24252,20 +24361,28 @@

Supress success

@@ -24322,11 +24439,15 @@

Supress success

*Thread Reply:* oh got it, since its in default, i need to click on it and choose my dbt profile’s account name. thnx

@@ -24357,11 +24478,15 @@

Supress success

*Thread Reply:* May I know, why these highlighted ones dont have schema? FYI, I used sources in dbt.

@@ -24418,11 +24543,15 @@

Supress success

*Thread Reply:* I prepared this yaml file, not sure this is what u asked

@@ -27866,11 +27995,15 @@

Supress success

I have a dag that contains 2 tasks:

@@ -28832,11 +28965,15 @@

Supress success

@@ -28867,11 +29004,15 @@

Supress success

It created 3 namespaces. One was the one that I point in the spark config property. The other 2 are the bucket that we are writing to () and the bucket where we are reading from ()

@@ -28928,11 +29069,15 @@

Supress success

I can see if i enter in one of the weird jobs generated this:

@@ -28963,11 +29108,15 @@

Supress success

*Thread Reply:* This job with no output is a symptom of the output not being understood. you should be able to see the facets for that job. There will be a spark_unknown facet with more information about the problem. If you put that into an issue with some more details about this job we should be able to help.

@@ -29026,11 +29175,15 @@

Supress success

If I check the logs of marquez-web and marquez I can't see any error there

@@ -29061,11 +29214,15 @@

Supress success

When I try to open the job fulfilments.execute_insert_into_hadoop_fs_relation_command I see this window:

@@ -30882,11 +31039,15 @@

Supress success

I cannot see a graph of my job now. Is this something to do with the namespace names?

@@ -30995,11 +31156,15 @@

Supress success

*Thread Reply:* Here's what I mean:

- + - + + +
@@ -31226,7 +31391,7 @@

Supress success

*Thread Reply:* This is an example Lineage event JSON I am sending.

- + @@ -35361,29 +35526,41 @@

Supress success

Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]

@@ -35554,56 +35731,80 @@

Supress success

*Thread Reply:* There are two types of failures: tests failed on stage model (relationships) and physical error in master model (no table with such name). The stage test node in Marquez does not show any indication of failures and dataset node indicates failure but without number of failed records or table name for persistent test storage. The failed master model shows in red but no details of failure. Master model tests were skipped because of model failure but UI reports "Complete".

@@ -35638,20 +35839,28 @@

Supress success

And for dbt test failures, to visualize better that error is happening, for example like that:

@@ -35823,11 +36032,15 @@

Supress success

hello everyone , i'm learning Openlineage, I am trying to connect with airflow 2, is it possible? or that version is not yet released. this is currently throwing me airflow

@@ -36077,6 +36290,43 @@

Supress success

+ +
+
+ + + + +
+ +
David Virgil + (david.virgil.naranjo@googlemail.com) +
+
2022-01-11 12:23:41
+
+ + + + + +
+
+
+
+ + @@ -36360,11 +36610,15 @@

Supress success

@@ -36834,11 +37088,15 @@

Supress success

*Thread Reply:* It needs to show Docker Desktop is running :

@@ -37154,20 +37412,28 @@

Supress success

@@ -39803,7 +40069,7 @@

Supress success

I've attached the logs and a screenshot of what I'm seeing the Spark UI. If you had a chance to take a look, it's a bit verbose but I'd appreciate a second pair of eyes on my analysis. Hopefully I got something wrong 😅

- + @@ -39812,11 +40078,15 @@

Supress success

@@ -39983,11 +40253,15 @@

Supress success

@@ -40596,7 +40870,7 @@

Supress success

*Thread Reply:* This is the one I wrote:

- + @@ -41169,11 +41443,15 @@

Supress success

*Thread Reply:* however I can not fetch initial data when login into the endpoint

@@ -41681,11 +41959,15 @@

Supress success

https://files.slack.com/files-pri/T01CWUYP5AR-F036JKN77EW/image.png

@@ -43154,11 +43436,15 @@

Supress success

@Kevin Mellott Hello Kevin, sorry to bother you again. I was finally able to configure Marquez in AWS using an ALB. Now I am receiving this error when calling the API

@@ -44042,11 +44328,15 @@

Supress success

Am i supposed to see this when I open marquez fro the first time on an empty database?

@@ -44433,11 +44723,15 @@

Supress success

Do I follow these steps?

@@ -44549,11 +44843,15 @@

Supress success

Do i use OpenLineageURL or Marquez_URL?

@@ -44883,11 +45181,15 @@

logger = logging.getLogger(name)

@@ -48303,11 +48605,15 @@

logger = logging.getLogger(name)

Hi Everyone, Can someone please help me to debug this error ? Thank you very much all

@@ -49555,11 +49861,15 @@

logger = logging.getLogger(name)

Hello everyone, I'm learning Openlineage, I finally achieved the connection between Airflow 2+ and Openlineage+Marquez. The issue is that I don't see nothing on Marquez. Do I need to modify current airflow operators?

@@ -49642,11 +49952,15 @@

logger = logging.getLogger(name)

value: data-dev```

@@ -49704,11 +50018,15 @@

logger = logging.getLogger(name)

*Thread Reply:* Thanks, finally was my error .. I created a dummy dag to see if maybe it's an issue over the dag and now I can see something over Marquez

@@ -49824,7 +50142,7 @@

logger = logging.getLogger(name)

Any thoughts?

- + @@ -49833,7 +50151,7 @@

logger = logging.getLogger(name)

- + @@ -50911,11 +51229,15 @@

logger = logging.getLogger(name)

happy to share the slides with you if you want 👍 here’s a PDF:

@@ -51028,11 +51350,15 @@

logger = logging.getLogger(name)

Your periodical reminder that Github stars are one of those trivial things that make a significant difference for an OS project like ours. Have you starred us yet?

@@ -53756,11 +54082,15 @@

logger = logging.getLogger(name)

The picture is my custom extractor, it's not doing anything currently as this is just a test.

@@ -53843,11 +54173,15 @@

logger = logging.getLogger(name)

*Thread Reply:*

@@ -53959,11 +54293,15 @@

logger = logging.getLogger(name)

This is a similar setup as Michael had in the video.

@@ -54438,11 +54776,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

I was testing https://github.com/MarquezProject/marquez/tree/main/examples/airflow#step-21-create-dag-counter, and the following error was observed in my airflow env:

@@ -55966,38 +56308,54 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Please reach me out if you have any questions!

@@ -56482,20 +56840,28 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hi~all, I have a question about lineage. I am now running airflow 2.3.1 and have started a latest marquez service by docker-compose. I found that using the example DAG of airflow can only see the job information, but not the lineage of the job. How can I configure it to see the lineage ?

@@ -57725,20 +58091,28 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hello all, after sending dbt openlineage events to Marquez, I am now looking to use the Marquez API to extract the lineage information. I am able to use python requests to call the Marquez API to get other information such as namespaces, datasets, etc., but I am a little bit confused about what I need to enter to get the lineage. I included screenshots for what the API reference shows regarding retrieving the lineage where it shows that a nodeId is required. However, this is where I seem to be having problems. It is not exactly clear where the nodeId needs to be set or what the nodeId needs to include. I would really appreciate any insights. Thank you!

@@ -57797,11 +58171,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

*Thread Reply:* You can do this in a few ways (that I can think of). First, by looking for a namespace, then querying for the datasets in that namespace:

@@ -57832,11 +58210,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

*Thread Reply:* Or you can search, if you know the name of the dataset:

@@ -60640,6 +61022,43 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

+ +
+
+ + + + +
+ +
Conor Beverland + (conorbev@gmail.com) +
+
2022-06-28 20:05:54
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -60668,6 +61087,43 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

+ +
+
+ + + + +
+ +
Conor Beverland + (conorbev@gmail.com) +
+
2022-06-28 20:07:27
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -63015,11 +63471,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

check this out folks - marklogic datahub flow lineage into OL/marquez with jobs and runs and more. i would guess this is a pretty narrow use case but it went together really smoothly and thought i'd share sometimes it's just cool to see what people are working on

@@ -64118,11 +64578,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hi all, I have been playing around with Marquez for a hackday. I have been able to get some lineage information loaded in (using the local docker version for now). I have been trying set the location (for the link) and description information for a job (the text saying "Nothing to show here") but I haven't been able to figure out how to do this using the /lineage api. Any help would be appreciated.

@@ -65110,11 +65574,15 @@

SundayFunday

Putting together some internal training for OpenLineage and highlighting some of the areas that have been useful to me on my journey with OpenLineage. Many thanks to @Michael Collado, @Maciej Obuchowski, and @Paweł Leszczyński for the continued technical support and guidance.

@@ -65257,20 +65725,28 @@

SundayFunday

hi all, really appreciate if anyone could help. I have been trying to create a poc project with openlineage with dbt. attached will be the pip list of the openlineage packages that i have. However, when i run "dbt-ol"command, it prompted as öpen as file, instead of running as a command. the regular dbt run can be executed without issue. i would want i had done wrong or if any configuration that i have missed. Thanks a lot

@@ -65649,7 +66125,7 @@

SundayFunday

./gradlew :shared:spotlessApply &amp;&amp; ./gradlew :app:spotlessApply &amp;&amp; ./gradlew clean build test

- + @@ -66401,11 +66877,15 @@

SundayFunday

maybe another question for @Paweł Leszczyński: I was watching the Airflow summit talk that you and @Maciej Obuchowski did ( very nice! ). How is this exposed? I'm wondering if it shows up as an edge on the graph in Marquez? ( I guess it may be tracked as a parent run and if so probably does not show on the graph directly at this time? )

@@ -66869,11 +67349,15 @@

SundayFunday

*Thread Reply:*

@@ -68877,11 +69361,15 @@

SundayFunday

*Thread Reply:* After I send COMPLETE event with the same information I can see the dataset.

@@ -68945,11 +69433,15 @@

SundayFunday

In this example I've added my-test-input on START and my-test-input2 on COMPLETE :

@@ -71716,11 +72208,15 @@

SundayFunday

Here is the Marquez UI

- + - + + +
@@ -72430,11 +72926,15 @@

SundayFunday

*Thread Reply:*

@@ -77177,11 +77677,15 @@

SundayFunday

*Thread Reply:* Apparently the value is hard coded in the code somewhere that I couldn't figure out but at-least learnt that in my Mac where this port 5000 is being held up can be freed by following the below simple step.

@@ -84818,11 +85322,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

But if I am not in a virtual environment, it installs the packages in my PYTHONPATH. You might try this to see if the dbt-ol script can be found in one of the directories in sys.path.

@@ -84853,11 +85361,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* this can help you verify that your PYTHONPATH and PATH are correct - installing an unrelated python command-line tool and seeing if you can execute it:

@@ -89933,11 +90445,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -93252,11 +93768,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi Team, I’m seeing creating data source, dataset API’s marked as deprecated . Can anyone point me how to create datasets via API calls?

@@ -94211,11 +94731,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Is it possible to add column level lineage via api? Let's say I have fields A,B,C from my-input, and A,B from my-output, and B,C from my-output-s3. I want to see, filter, or query by the column name.

@@ -97313,11 +97837,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

23/04/20 10:00:15 INFO ConsoleTransport: {"eventType":"START","eventTime":"2023-04-20T10:00:15.085Z","run":{"runId":"ef4f46d1-d13a-420a-87c3-19fbf6ffa231","facets":{"spark.logicalPlan":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.22.0/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect","num-children":2,"name":0,"partitioning":[],"query":1,"tableSpec":null,"writeOptions":null,"ignoreIfExists":false},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedTableName","num-children":0,"catalog":null,"ident":null},{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"workorderid","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-cl

@@ -99066,11 +99594,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi, I'm new to Open data lineage and I'm trying to connect snowflake database with marquez using airflow and getting the error in etl_openlineage while running the airflow dag on local ubuntu environment and unable to see the marquez UI once it etl_openlineage has ran completed as success.

@@ -99101,11 +99633,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* What's the extract_openlineage.py file? Looks like your code?

@@ -99670,11 +100206,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* This is my log in airflow, can you please prvide more info over it.

@@ -99735,20 +100275,28 @@

MARQUEZAPIKEY=[YOURAPIKEY]

App listening on port 3000!

@@ -99827,20 +100375,28 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -101255,11 +101811,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Be on the lookout for an announcement about the next meetup!

@@ -101795,11 +102355,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I have configured Open lineage with databricks and it is sending events to Marquez as expected. I have a notebook which joins 3 tables and write the result data frame to an azure adls location. Each time I run the notebook manually, it creates two start events and two complete events for one run as shown in the screenshot. Is this something expected or I am missing something?

@@ -102859,11 +103423,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I have a usecase where we are connecting to Azure sql database from databricks to extract, transform and load data to delta tables. I could see the lineage is getting build, but there is no column level lineage through its 1:1 mapping from source. Could you please check and update on this.

@@ -102977,7 +103545,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Here is the code we use.

- + @@ -104093,7 +104661,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@Paweł Leszczyński @Michael Robinson

- + @@ -108410,11 +108978,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I can see my job there but when i click on the job when its supposed to show lineage, its just an empty screen

@@ -108535,11 +109107,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* ohh but if i try using the console output, it throws ClientProtocolError

@@ -108596,11 +109172,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* this is the dev console in browser

@@ -108831,11 +109411,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* marquez didnt get updated

@@ -109339,6 +109923,43 @@

MARQUEZAPIKEY=[YOURAPIKEY]

+ +
+
+ + + + +
+ +
Rachana Gandhi + (rachana.gandhi410@gmail.com) +
+
2023-06-08 11:11:46
+ +
+
+
+ + @@ -110042,11 +110663,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@@ -110077,11 +110702,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @Michael Robinson When we follow the documentation without changing anything and run sudo ./docker/up.sh we are seeing following errors:

@@ -110112,11 +110741,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* So, I edited up.sh file and modified docker compose command by removing --log-level flag and ran sudo ./docker/up.sh and found following errors:

@@ -110147,11 +110780,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Then I copied .env.example to .env since compose needs .env file

@@ -110182,11 +110819,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* I got this error:

@@ -110273,11 +110914,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @Michael Robinson Then it kind of worked but seeing following errors:

@@ -110308,11 +110953,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -110656,11 +111305,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -111536,7 +112189,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* This is the event generated for above query.

- + @@ -111607,7 +112260,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

this is event for view for which no lineage is being generated

- + @@ -112022,11 +112675,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

It was great meeting/catching up with everyone. Hope to see you and more new faces at the next one!

@@ -112830,11 +113487,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@@ -116216,11 +116877,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi, I am running a job in Marquez with 180 rows of metadata but it is running for more than an hour. Is there a way to check the log on Marquez? Below is the screenshot of the job:

@@ -116278,11 +116943,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Also, yes, we have an even viewer that allows you to query the raw OL events

@@ -116339,7 +117008,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

- + @@ -117118,11 +117787,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

but the page is empty

@@ -117452,11 +118125,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I can now see this

@@ -117487,11 +118164,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* but when i click on the job i then get this

@@ -117548,11 +118229,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @George Polychronopoulos Hi, I am facing the same issue. After adding spark conf and using the docker run command, marquez is still showing empty. Do I need to change something in the run command?

@@ -119539,11 +120224,15 @@

Marquez as an OpenLineage Client

@@ -119976,20 +120665,28 @@

Marquez as an OpenLineage Client

@@ -121039,7 +121736,7 @@

Marquez as an OpenLineage Client

Expected. vs Actual.

- + @@ -121048,7 +121745,7 @@

Marquez as an OpenLineage Client

- + @@ -121066,6 +121763,56 @@

Marquez as an OpenLineage Client

+ +
+
+ + + + +
+ +
GitHubOpenLineageIssues + (githubopenlineageissues@gmail.com) +
+
2023-08-07 11:21:04
+
+ + +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -124136,20 +124883,28 @@

Marquez as an OpenLineage Client

The OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?

@@ -124345,20 +125100,28 @@

csv_file = location.csv

Part of the logs with the OL configurations and the processed event

@@ -124462,11 +125225,15 @@

csv_file = location.csv

@@ -125033,11 +125800,15 @@

csv_file = location.csv

*Thread Reply:* I assume the problem is somewhere there, not on the level of facet definition, since SchemaDatasetFacet looks pretty much the same and it works

@@ -125157,11 +125928,15 @@

csv_file = location.csv

*Thread Reply:*

@@ -125192,11 +125967,15 @@

csv_file = location.csv

*Thread Reply:* I think the code here filters out those string values in the list

@@ -125426,11 +126205,15 @@

csv_file = location.csv

@@ -126480,12 +127263,12 @@

csv_file = location.csv

let me update the branch and test again

- + - - + @@ -126744,6 +127527,130 @@

csv_file = location.csv

+
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-02-28 06:02:00
+
+

*Thread Reply:* @Paweł Leszczyński @Maciej Obuchowski +can you please approve this CI to run integration tests? +https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/9497/workflows/4a20dc95-d5d1-4ad7-967c-edb6e2538820

+ + + +
+ 👍 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-02-29 01:13:11
+
+

*Thread Reply:* @Paweł Leszczyński +only 2 spark version are sending empty +input and output +for both START and COMPLETE event

+ +
+

• 3.4.2 + • 3.5.0 + i can look into the above , if you guide me a bit on how to ? + should i open a new ticket for it? + please suggest how to proceed?

+
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-03-01 04:01:45
+
+

*Thread Reply:* this integration test case lead to finding of the above bug for spark 3.4.2 and 3.5.0 +will that be a blocker to merge this test case? +@Paweł Leszczyński @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-03-06 09:01:44
+
+

*Thread Reply:* @Paweł Leszczyński @Maciej Obuchowski +any direction on the above blocker will be helpful.

+ + + +
+
+
+
+ + + + +
@@ -127691,11 +128598,15 @@

csv_file = location.csv

I was doing this a second ago and this ended up with Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@1609ed55

@@ -128866,11 +129777,15 @@

csv_file = location.csv

*Thread Reply:* Can you please share with me your json conf for the cluster ?

@@ -128901,11 +129816,15 @@

csv_file = location.csv

*Thread Reply:* It's because in mu build file I have

@@ -128936,11 +129855,15 @@

csv_file = location.csv

*Thread Reply:* and the one that was copied is

@@ -132181,20 +133104,28 @@

csv_file = location.csv

Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at https://openlineage.io/getting-started/. However, I've encountered a problem while attempting to complete *Step 1* of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.

@@ -132251,11 +133182,15 @@

csv_file = location.csv

*Thread Reply:* @Jakub Dardziński 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.

@@ -134980,11 +135915,15 @@

set the log level for the openlineage spark library

*Thread Reply:* This is the error message:

@@ -135041,11 +135980,15 @@

set the log level for the openlineage spark library

I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGEEXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGEEXTRACTORS is dependencies.<filename>.<extractorclass_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake

@@ -135365,11 +136308,15 @@

set the log level for the openlineage spark library

*Thread Reply:*

@@ -135427,11 +136374,15 @@

set the log level for the openlineage spark library

*Thread Reply:*

@@ -135488,11 +136439,15 @@

set the log level for the openlineage spark library

*Thread Reply:* https://openlineage.slack.com/files/U05QL7LN2GH/F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png

@@ -135679,7 +136634,7 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the worker pod logs…where there is no log of openlineageplugin

- + @@ -135821,11 +136776,15 @@

set the log level for the openlineage spark library

*Thread Reply:* this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.bigqueryinsertjobextractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder

@@ -135856,11 +136815,15 @@

set the log level for the openlineage spark library

*Thread Reply:* https://openlineage.slack.com/files/U05QL7LN2GH/F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png

@@ -135891,11 +136854,15 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the logs of the triggerer pod specifically

@@ -135978,11 +136945,15 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for

@@ -136272,11 +137243,15 @@

set the log level for the openlineage spark library

*Thread Reply:* have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGEEXTRACTORS to bigqueryinsertjob_extractor.BigQueryInsertJobExtractor…still the same thing

@@ -136502,11 +137477,15 @@

set the log level for the openlineage spark library

*Thread Reply:* I’ve done experiment, that’s how gcs looks like

@@ -136537,11 +137516,15 @@

set the log level for the openlineage spark library

*Thread Reply:* and env vars

@@ -137171,7 +138154,7 @@

set the log level for the openlineage spark library

- + @@ -137206,7 +138189,7 @@

set the log level for the openlineage spark library

*Thread Reply:*

- + @@ -139336,7 +140319,7 @@

set the log level for the openlineage spark library

I am attaching the log4j, there is no openlineagecontext

- + @@ -140331,47 +141314,67 @@

set the log level for the openlineage spark library

@@ -140422,29 +141425,41 @@

set the log level for the openlineage spark library

*Thread Reply:* A few more pics:

@@ -143258,16 +144273,20 @@

set the log level for the openlineage spark library

@here I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this

- + @@ -144823,11 +145842,15 @@

set the log level for the openlineage spark library

*Thread Reply:* @Paweł Leszczyński this is what I am getting

@@ -144858,7 +145881,7 @@

set the log level for the openlineage spark library

*Thread Reply:* attaching the html

- + @@ -145500,11 +146523,15 @@

set the log level for the openlineage spark library

*Thread Reply:* @Paweł Leszczyński you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the "environment-properties" attribute. Some condition is not being met here, or maybe it is what @Jason Yip suspects and there's some sort of filtering of unnecessary events

@@ -146215,11 +147242,15 @@

set the log level for the openlineage spark library

*Thread Reply:* In docker, marquez-api image is not running and exiting with the exit code 127.

@@ -146765,11 +147796,15 @@

set the log level for the openlineage spark library

Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated

@@ -147274,11 +148309,15 @@

set the log level for the openlineage spark library

*Thread Reply:* I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?

@@ -148593,7 +149632,7 @@

set the log level for the openlineage spark library

*Thread Reply:*

- + @@ -150141,7 +151180,7 @@

show data

@Paweł Leszczyński I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?

- + @@ -152003,7 +153042,7 @@

show data

@Paweł Leszczyński I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....

- + @@ -153486,11 +154525,15 @@

show data

like ( file_name, size, modification time, creation time )

@@ -154451,11 +155494,15 @@

show data

execute_spark_script(1, "/home/haneefa/airflow/dags/saved_files/")

@@ -155287,12 +156334,12 @@

Set up SparkSubmitOperator for each query

I was referring to fluentd openlineage proxy which lets users copy the event and send it to multiple backend. Fluentd has a list of out-of-the box output plugins containing BigQuery, S3, Redshift and others (https://www.fluentd.org/dataoutputs)

- + - - + @@ -157316,7 +158363,7 @@

Set up SparkSubmitOperator for each query

*Thread Reply:* This text file contains a total of 10-11 events, including the start and completion events of one of my notebook runs. The process is simply reading from a Hive location and performing a full load to another Hive location.

- + @@ -160042,12 +161089,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

Thanks 🙏

- + - - + @@ -161188,12 +162235,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:* in Admin > Plugins can you see whether you have OpenLineageProviderPlugin and if so, are there listeners?

- + - - + @@ -161292,7 +162339,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:* Dont

- + @@ -161353,7 +162400,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:*

- + @@ -162629,6 +163676,39 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

+ +
+
+ + + + +
+ +
Zacay Daushin + (zacayd@octopai.com) +
+
2023-12-20 07:25:53
+
+ + +
+ + + + + + + +
+ + +
+
+
+
+ + @@ -163587,12 +164667,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

I've created a pdf with some code samples and OL inputs and output attributes.

- + - - + @@ -163600,7 +164680,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

- + @@ -165675,12 +166755,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

Do we have the functionality to search on the lineage we are getting?

- + - - + @@ -166783,12 +167863,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:*

- + - - + @@ -166993,7 +168073,7 @@

Gradle 8.5

- + @@ -167599,12 +168679,12 @@

Gradle 8.5

any suggestions on naming for Graph API sources from outlook? I pull a lot of data from email attachments with Airflow. generally I am passing a resource (email address), the mailbox, and subfolder. from there I list messages and find attachments

- + - - + @@ -168448,12 +169528,12 @@

Gradle 8.5

Hello team I see the following issue when i install apache-airflow-providers-openlineage==1.4.0

- + - - + @@ -168674,12 +169754,12 @@

Gradle 8.5

is there any solution?

- + - - + @@ -168984,12 +170064,12 @@

Gradle 8.5

- + - - + @@ -169177,6 +170257,32 @@

Gradle 8.5

+
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:02:34
+
+

*Thread Reply:* @jayant joshi did deleting all volumes work for you, or did you discover another solution? We see users encountering this error from time to time, and it would be helpful to know more.

+ + + +
+
+
+
+ + + + +
@@ -169285,7 +170391,7 @@

Gradle 8.5

- ❤️ Ross Turk, Harel Shein, tati, Rodrigo Maia, Maciej Obuchowski, Jarek Potiuk, Mattia Bertorello + ❤️ Ross Turk, Harel Shein, tati, Rodrigo Maia, Maciej Obuchowski, Jarek Potiuk, Mattia Bertorello, Sheeri Cabral (Collibra)
@@ -169355,12 +170461,12 @@

Gradle 8.5

"spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" --packages "io.openlineage:openlineagespark:1.7.0" --conf "spark.openlineage.transport.type=http" --conf "spark.openlineage.transport.url= http://marquez-api:5000" --conf "spark.openlineage.namespace=sparkintegration" pyspark_etl.py".

- + - - + @@ -169715,12 +170821,12 @@

Gradle 8.5

*Thread Reply:* Find the attached localhost 5000 & 5001 port results. Note that while running same code in the jupyter notebook, I could see lineage on the Marquez UI. For running a code through spark-submit only I am facing an issue.

- + - - + @@ -169728,12 +170834,12 @@

Gradle 8.5

- + - - + @@ -170234,12 +171340,12 @@

Gradle 8.5

*Thread Reply:* From your code, I could see marquez-api is running successfully at "http://marquez-api:5000". Find attached screenshot.

- + - - + @@ -170485,12 +171591,12 @@

Gradle 8.5

*Thread Reply:* the quickstart guide shows this example and it produces the result with a output node in the results, But when I run this in databricks I see no output node generated.

- + - - + @@ -170498,12 +171604,12 @@

Gradle 8.5

- + - - + @@ -170579,12 +171685,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* as a result onkar_table as a dataset was never recorded hence lineage between mayur_table and onkar_table was not recorded as well

- + - - + @@ -170592,12 +171698,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -170981,12 +172087,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -171473,12 +172579,12 @@

Write the data from the source DataFrame to the destination table

Error Screenshot:

- + - - + @@ -171543,12 +172649,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -171610,12 +172716,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* While composing up an open lineage docker-compose.yml. It showed the path to access jupyter lab, through the path I am accessing it. I didn't run any command externally. Find the attached screenshot.

- + - - + @@ -171706,12 +172812,12 @@

Write the data from the source DataFrame to the destination table

I just tried to inspect the notebook container, there I could "GRANT_SUDO=yes". And after passing this also it's asking the password. Find the attached screenshot. Thanks.

- + - - + @@ -172212,12 +173318,12 @@

Write the data from the source DataFrame to the destination table

listeners should be there under OpenLineageProviderPlugin

- + - - + @@ -172408,12 +173514,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* This is the snapshot of my Plugins. I will also try with the configs which you mentioned.

- + - - + @@ -173078,12 +174184,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -173091,12 +174197,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -173104,12 +174210,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -173417,12 +174523,12 @@

Write the data from the source DataFrame to the destination table

Do you have any idea how to fix this?

- + - - + @@ -173484,12 +174590,12 @@

Write the data from the source DataFrame to the destination table

DETAIL: Role "marquez" does not exist.

- + - - + @@ -173549,12 +174655,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* Probably you might ask this.

- + - - + @@ -173615,12 +174721,12 @@

Write the data from the source DataFrame to the destination table

With this, the above error was gone. But it has an authentication error as below.

- + - - + @@ -174651,12 +175757,12 @@

Write the data from the source DataFrame to the destination table

We have gone through the OpenLineage documentation, from the documentation we could only get supported spark versions and data source types alone. Thanks.

- + - - + @@ -174912,12 +176018,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -174951,12 +176057,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -174990,12 +176096,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -175057,12 +176163,12 @@

Write the data from the source DataFrame to the destination table

I did an airflow backfill job which redownloaded all files from a SFTP (191 files) and each of those are a separate OL dataset. in this view I clicked on a single file, but because it is connected to the "extract" airflow task, it shows all of the files that task downloaded as well (dynamic mapped tasks in Airflow)

- + - - + @@ -176608,6 +177714,94 @@

Write the data from the source DataFrame to the destination table

+
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 12:43:16
+
+

*Thread Reply:* @Matthew Paras Hi! +im still struggling with empty outputs on databricks with OL latest version.

+ +

24/03/13 16:35:56 INFO PlanUtils: apply method failed with +org.apache.spark.SparkException: There is no Credential Scope. Current env: Driver

+ +

Any idea on how to solve this?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 12:53:44
+
+

*Thread Reply:* Any databricks runtime version i should test with?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Matthew Paras + (matthewparas2020@u.northwestern.edu) +
+
2024-03-13 15:35:41
+
+

*Thread Reply:* interesting, I think we're running on 13.3 LTS - we also haven't upgraded to the official OL version, still using the patched one that I built

+ + +
@@ -177825,6 +179019,123 @@

Write the data from the source DataFrame to the destination table

+
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 13:28:12
+
+

*Thread Reply:* @Athitya Kumar can you tell us if this resolved your issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Athitya Kumar + (athityakumar@gmail.com) +
+
2024-03-06 01:30:32
+
+

*Thread Reply:* @Michael Robinson - Yup, it's resolved for event types that're already being emitted from OpenLineage - but we have some events like StageCompleted / TaskEnd etc where we don't send events currently, where we'd like to plug-in our CustomFacets

+ +

https://openlineage.slack.com/archives/C01CK9T7HKR/p1709298185120219?thread_ts=1709297395.323109&cid=C01CK9T7HKR

+
+ + +
+ + + } + + Maciej Obuchowski + (https://openlineage.slack.com/team/U01RA9B5GG2) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-06 12:57:53
+
+

*Thread Reply:* @Athitya Kumar can you store the facets somewhere (like OpenLineageContext) and send them with complete event later?

+ + + +
+
+
+
+ + + + +
@@ -177925,12 +179236,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* here is an axample:

- + - - + @@ -178704,12 +180015,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -179063,19 +180374,177 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
Max Zheng (mzheng@plaid.com)
-
2024-02-26 12:52:47
+
2024-02-27 13:26:46
+
+

*Thread Reply:* Seems like its on OpenLineageSparkListener.onJobEnd +```24/02/25 16:12:49 INFO PlanUtils: apply method failed with +java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. +This stopped SparkContext was created at:

+ +

org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) +sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) +sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) +sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) +java.lang.reflect.Constructor.newInstance(Constructor.java:423) +py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) +py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) +py4j.Gateway.invoke(Gateway.java:238) +py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) +py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) +py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) +py4j.ClientServerConnection.run(ClientServerConnection.java:106) +java.lang.Thread.run(Thread.java:750)

+ +

The currently active SparkContext was created at:

+ +

org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) +sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) +sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) +sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) +java.lang.reflect.Constructor.newInstance(Constructor.java:423) +py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) +py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) +py4j.Gateway.invoke(Gateway.java:238) +py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) +py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) +py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) +py4j.ClientServerConnection.run(ClientServerConnection.java:106) +java.lang.Thread.run(Thread.java:750)

+ +
at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:121) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SparkSession.&lt;init&gt;(SparkSession.scala:113) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:962) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SQLContext$.getOrCreate(SQLContext.scala:1023) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SQLContext.getOrCreate(SQLContext.scala) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.hudi.client.common.HoodieSparkEngineContext.&lt;init&gt;(HoodieSparkEngineContext.java:65) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.SparkHoodieTableFileIndex.&lt;init&gt;(SparkHoodieTableFileIndex.scala:65) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieFileIndex.&lt;init&gt;(HoodieFileIndex.scala:81) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieBaseRelation.fileIndex$lzycompute(HoodieBaseRelation.scala:236) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieBaseRelation.fileIndex(HoodieBaseRelation.scala:234) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.BaseFileOnlyRelation.toHadoopFsRelation(BaseFileOnlyRelation.scala:153) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource$.resolveBaseFileOnlyRelation(DefaultSource.scala:268) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:232) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:111) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:68) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at io.openlineage.spark.agent.lifecycle.plan.SaveIntoDataSourceCommandVisitor.apply(SaveIntoDataSourceCommandVisitor.java:140) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.plan.SaveIntoDataSourceCommandVisitor.apply(SaveIntoDataSourceCommandVisitor.java:47) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder$1.apply(AbstractQueryPlanDatasetBuilder.java:94) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder$1.apply(AbstractQueryPlanDatasetBuilder.java:85) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.util.PlanUtils.safeApply(PlanUtils.java:279) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.lambda$apply$0(AbstractQueryPlanDatasetBuilder.java:75) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at java.util.Optional.map(Optional.java:215) ~[?:1.8.0_392]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.apply(AbstractQueryPlanDatasetBuilder.java:67) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.apply(AbstractQueryPlanDatasetBuilder.java:39) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.util.PlanUtils.safeApply(PlanUtils.java:279) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$null$23(OpenLineageRunEventBuilder.java:451) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_392]
+at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[?:1.8.0_392]
+at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:1.8.0_392]
+at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) ~[?:1.8.0_392]
+at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313) ~[?:1.8.0_392]
+at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_392]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildOutputDatasets(OpenLineageRunEventBuilder.java:410) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:298) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:281) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:259) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.end(SparkSQLExecutionContext.java:257) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:167) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:39) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) ~[scala-library-2.12.15.jar:?]
+at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) ~[scala-library-2.12.15.jar:?]
+at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1447) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+
+ +

24/02/25 16:13:04 INFO AsyncEventQueue: Process of event SparkListenerJobEnd(23,1708877534168,JobSucceeded) by listener OpenLineageSparkListener took 15.64437991s. +24/02/25 16:13:04 ERROR JniBasedUnixGroupsMapping: error looking up the name of group 1001: No such file or directory```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 19:20:10
-

Lastly, would disabling facets improve performance? eg. disabling spark.logicalPlan

+

*Thread Reply:* Hmm yeah I'm confused, https://github.com/OpenLineage/OpenLineage/blob/1.6.2/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PlanUtils.java#L277 seems to indicate as you said (safeApply swallows the exception), but the job exits after on an error code (EMR marks the job as failed)

+ +

The crash stops if I remove spark.stop() or disable the OpenLineage listener so this is odd 🤔

+
+ + + + + + + + + + + + + + + + +
@@ -179089,7 +180558,7 @@

Write the data from the source DataFrame to the destination table

-
+
@@ -179099,15 +180568,56 @@

Write the data from the source DataFrame to the destination table

Paweł Leszczyński (pawel.leszczynski@getindata.com)
-
2024-02-27 02:26:44
+
2024-02-28 04:21:31
-

*Thread Reply:* Disabling spark.LogicalPlan may improve performance of populating OL event. It's disabled by default in recent version (the one released yesterday). You can also use circuit breaker feature if you are worried about Ol integration affecting Spark jobs

+

*Thread Reply:* 24/02/25 16:12:49 INFO PlanUtils: apply method failed with -> yeah, log level is info. It would look as if you were trying to run some action after stopping spark, but you said that disabling OpenLineage listener makes it succeed. This is odd.

-
- 🤩 Yannick Libert -
+
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-28 13:11:11
+
+

*Thread Reply:* Maybe its some race condition on shutdown logic with event listeners? It seems like the listener being enabled is causing executors to be spun up (which fails) after the Spark session is already stopped

+ +

• After the stacktrace above I see ConsoleTransport log some OpenLineage event data +• Then oddly it looks like a bunch of executors are launched after the Spark session has already been stopped +• These executors crash on startup which is likely whats causing the Spark job to exit with an error code +24/02/24 07:18:03 INFO ConsoleTransport: {"eventTime":"2024_02_24T07:17:05.344Z","producer":"<https://github.com/OpenLineage/OpenLineage/tree/1.6.2/integration/spark>", +... +24/02/24 07:18:06 INFO YarnAllocator: Will request 1 executor container(s) for ResourceProfile Id: 0, each with 4 core(s) and 27136 MB memory. with custom resources: &lt;memory:27136, max memory:2147483647, vCores:4, max vCores:2147483647&gt; +24/02/24 07:18:06 INFO YarnAllocator: Submitted 1 unlocalized container requests. +24/02/24 07:18:09 INFO YarnAllocator: Launching container container_1708758297553_0001_01_000004 on host {ip} for executor with ID 3 for ResourceProfile Id 0 with resources &lt;memory:27136, vCores:4&gt; +24/02/24 07:18:09 INFO YarnAllocator: Launching executor with 21708m of heap (plus 5428m overhead/off heap) and 4 cores +24/02/24 07:18:09 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. +24/02/24 07:18:09 INFO YarnAllocator: Completed container container_1708758297553_0001_01_000003 on host: {ip} (state: COMPLETE, exit status: 1) +24/02/24 07:18:09 WARN YarnAllocator: Container from a bad node: container_1708758297553_0001_01_000003 on host: {ip}. Exit status: 1. Diagnostics: [2024-02-24 07:18:06.508]Exception from container-launch. +Container id: container_1708758297553_0001_01_000003 +Exit code: 1 +Exception message: Launch container failed +Shell error output: Nonzero exit code=1, error message='Invalid argument number' +The new executors all fail with: +Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find endpoint: <spark://CoarseGrainedScheduler>@{ip}:{port}

+ +
@@ -179119,19 +180629,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Yannick Libert - (yannick.libert.partner@decathlon.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-27 05:20:13
+
2024-02-28 13:44:20
-

*Thread Reply:* This feature is going to be so useful for us! Love it!

+

*Thread Reply:* The debug logs from AsyncEventQueue show OpenLineageSparkListener took 21.301411402s fwiw - I'm assuming thats abnormally long

@@ -179145,51 +180655,334 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
-
Michael Robinson - (michael.robinson@astronomer.io) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 14:23:37
+
2024-02-28 16:07:37
-

@channel -We released OpenLineage 1.9.1, featuring: -• Airflow: add support for JobTypeJobFacet properties #2412 @mattiabertorello -• dbt: add support for JobTypeJobFacet properties #2411 @mattiabertorello -• Flink: support Flink Kafka dynamic source and sink #2417 @HuangZhenQiu -• Flink: support multi-topic Kafka Sink #2372 @pawel-big-lebowski -• Flink: support lineage for JDBC connector #2436 @HuangZhenQiu -• Flink: add common config gradle plugin #2461 @HuangZhenQiu -• Java: extend circuit breaker loaded with ServiceLoader #2435 @pawel-big-lebowski -• Spark: integration now emits intermediate, application level events wrapping entire job execution #2371 @mobuchowski -• Spark: support built-in lineage within DataSourceV2Relation #2394 @pawel-big-lebowski -• Spark: add support for JobTypeJobFacet properties #2410 @mattiabertorello -• Spark: stop sending spark.LogicalPlan facet by default #2433 @pawel-big-lebowski -• Spark/Flink/Java: circuit breaker #2407 @pawel-big-lebowski -• Spark: add the capability to publish Scala 2.12 and 2.13 variants of openlineage-spark #2446 @d-m-h -A large number of changes and bug fixes were also included. -Thanks to all our contributors with a special shout-out to @Damien Hawes, who contributed >10 PRs to this release! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.9.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.8.0...1.9.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

+

*Thread Reply:* The yarn logs also seem to indicate the listener is somehow causing the app to start up again +2024-02-24 07:18:00,152 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (SchedulerEventDispatcher:Event Processor): container_1708758297553_0001_01_000002 Container Transitioned from RUNNING to COMPLETED +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator (SchedulerEventDispatcher:Event Processor): assignedContainer application attempt=appattempt_1708758297553_0001_000001 container=null queue=default clusterResource=&lt;memory:54272, vCores:8&gt; type=OFF_SWITCH requestedPartition= +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo (SchedulerEventDispatcher:Event Processor): Allocate Updates PendingContainers: 2 Decremented by: 1 SchedulerRequestKey{priority=0, allocationRequestId=0, containerToUpdate=null} for: appattempt_1708758297553_0001_000001 +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (SchedulerEventDispatcher:Event Processor): container_1708758297553_0001_01_000003 Container Transitioned from NEW to ALLOCATED +Is there some logic in the listener that can create a Spark session if there is no active session?

-
- 🚀 Jakub Dardziński, Jackson Goerner, Abdallah, Yannick Libert, Mattia Bertorello, Tristan GUEZENNEC -CROIX-, Fabio Manganiello +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-02-29 03:29:40
+
+

*Thread Reply:* not sure of this, I couldn't find any place of that in code

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:36:43
+
+

*Thread Reply:* Probably another instance when doing something generic does not work with Hudi well 😶

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-29 12:44:24
+
+

*Thread Reply:* Dumb question, what info needs to be fetched from Hudi? Is this in the createRelation call? I'm surprised the logs seem to indicate Hudi table metadata seems to be being read from S3 in the listener

+ +

What would need to be implemented for proper Hudi support?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 15:06:42
+
+

*Thread Reply:* @Max Zheng well, basically we need at least proper name and namespace for the dataset. How we do that is completely dependent on the underlying code, so probably somewhere here: https://github.com/apache/hudi/blob/3a97b01c0263c4790ffa958b865c682f40b4ada4/hudi-[…]-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala

+ +

Most likely we don't need to do any external calls or read anything from S3. It's just done because without something that understands Hudi classes we just do the generic thing (createRelation) that has the biggest chance to work.

+ +

For example, for Iceberg we can get the data required just by getting config from their catalog config - and I think with Hudi it has to work the same way, because logically - if you're reading some table, you have to know where it is or how it's named.

+
+ + + + + + + + + + + + + + + +
-
- 🎉 Abdallah, Mattia Bertorello + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-29 16:05:07
+
+

*Thread Reply:* That makes sense, and that info is in the hoodie.properties file that seems to be loaded based on the logs. But the events I see OL generate seem to have S3 path and S3 bucket as a the name and namespace respectively - ie. it doesn't seem to be using any of the metadata being read from Hudi? +"outputs": [ + { + "namespace": "s3://{bucket}", + "name": "{S3 prefix path}", +(we'd be perfectly happy with just the S3 path/bucket - is there a way to disable createRelation or have OL treat these Hudi as raw parquet?)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-05 05:58:14
+
+

*Thread Reply:* > But the events I see OL generate seem to have S3 path and S3 bucket as a the name and namespace respectively - ie. it doesn't seem to be using any of the metadata being read from Hudi? +Probably yes - as I've said, the OL handling of it is just inefficient and not specific to Hudi. It's good enought that they generate something that seems to be valid dataset naming 🙂 +And, the fact it reads S3 metadata is not intended - it's just that Hudi implements createRelation this way.

+ +
+

(we'd be perfectly happy with just the S3 path/bucket - is there a way to disable createRelation or have OL treat these Hudi as raw parquet?) + The way OpenLineage Spark integration works is by looking at Optimized Logical Plan of particular Spark job. So the solution would be to implement Hudi specific path in SaveIntoDataSourceCommandVisitor or any particular other visitor that touches on the Hudi path - or, if Hudi has their own LogicalPlan nodes, implement support for it.

+
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-05 08:50:14
+
+

*Thread Reply:* (sorry for answering that late @Max Zheng, I thought I had the response send and it was sitting in my draft for few days 😞 )

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-06 19:37:32
+
+

*Thread Reply:* Thanks for the explanation @Maciej Obuchowski

+ +

I've been digging into the source code to see if I can help contribute Hudi support for OL. At least in SaveIntoDataSourceCommandVisitor it seems all I need to do is: +```--- a/integration/spark/shared/src/main/java/io/openlineage/spark/agent/lifecycle/plan/SaveIntoDataSourceCommandVisitor.java ++++ b/integration/spark/shared/src/main/java/io/openlineage/spark/agent/lifecycle/plan/SaveIntoDataSourceCommandVisitor.java +@@ -114,8 +114,9 @@ public class SaveIntoDataSourceCommandVisitor + LifecycleStateChange lifecycleStateChange = + (SaveMode.Overwrite == command.mode()) ? OVERWRITE : CREATE;

  • if (command.dataSource().getClass().getName().contains("DeltaDataSource")) {
  • if (command.dataSource().getClass().getName().contains("DeltaDataSource") || command.dataSource().getClass().getName().contains("org.apache.hudi.Spark32PlusDefaultSource")) { +if (command.options().contains("path")) {
  • log.info("Delta/Hudi data source detected, path: {}", command.options().get("path").get()); + URI uri = URI.create(command.options().get("path").get()); + return Collections.singletonList( + outputDataset() +@@ -123,6 +124,7 @@ public class SaveIntoDataSourceCommandVisitor + } +}`` +This seems to work and avoids thecreateRelation` call but I still run into the same crash 🤔 so now I'm not sure if this is a Hudi issue. Do you know of any other dependencies on the output data source? I wonder if https://openlineage.slack.com/archives/C01CK9T7HKR/p1708671958295659 rdd events could be the culprit?
  • +
+ +

I'm going to try and reproduce the crash without Hudi and just with parquet

+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + +
+ +
@@ -179200,26 +180993,191 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 14:33:27
+
2024-03-06 20:24:14
-

*Thread Reply:* Oudstanding work @Damien Hawes 👏

+

*Thread Reply:* Hmm reading over RDDExecutionContext it seems highly unlikely anything in that would cause this crash

-
- ➕ Michael Robinson, Mattia Bertorello, Fabio Manganiello +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:53:44
+
+

*Thread Reply:* There might be other part related to reading from Hudi?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:54:22
+
+

*Thread Reply:* SaveIntoDataSourceCommandVisitor only takes care about root node of whole LogicalPlan

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:57:51
+
+

*Thread Reply:* I would serialize logical plan and take a look at leaf nodes of the job that causes hang

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:58:05
+
+

*Thread Reply:* for simple check you can just make the dataset handler that handles them return early

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 11:54:39
+
+

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1708544898883449?thread_ts=1708541527.152859&cid=C01CK9T7HKR the parsed logical plan for my test job is just the SaveIntoDataSourceCommandVisitor(though I might be mis-understanding what you mean by leaf nodes)

+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + +
+ +
@@ -179230,19 +181188,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Abdallah - (abdallah@terrab.me) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-27 00:39:29
+
2024-03-07 12:12:28
-

*Thread Reply:* Thank you 👏👏

+

*Thread Reply:* I was able to reproduce the issue with InsertIntoHadoopFsRelationCommand with aparquet write with the same job - I'm starting to suspect this is a Spark with Docker/yarn bug

@@ -179256,23 +181214,19 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-26 15:04:33
+
2024-03-07 13:17:19
-

Hi all, I'm working on a local Airflow-OpenLineage-Marquez integration using Airflow 2.7.3 and python 3.10. Everything seems to be installed correctly with the appropriate settings. I'm seeing events, jobs, tasks trickle into the UI. I'm using the PostgresOperator. When it's time for the SQL code to be parsed, I'm seeing the following in my Airflow logs: -[2024-02-26, 19:43:17 UTC] {sql.py:457} INFO - Running statement: SELECT CURRENT_SCHEMA;, parameters: None -[2024-02-26, 19:43:17 UTC] {base.py:152} WARNING - OpenLineage provider method failed to extract data from provider. -[2024-02-26, 19:43:17 UTC] {manager.py:198} WARNING - Extractor returns non-valid metadata: None -Can anyone give me pointers on why exactly this might be happening? I've tried also with the SQLExecuteQueryOperator, same result. I previously got a Marquez setup to work with the external OpenLineage package for Airflow with Airflow 2.6.1. But I'm struggling with this newer integrated OpenLineage version

+

*Thread Reply:* Without hudi read?

@@ -179286,21 +181240,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:10:21
+
2024-03-07 13:17:46
-

*Thread Reply:* Does this happen for some particular SQL but works for other? -Also, my understanding is that it worked with openlineage-airflow on Airflow 2.6.1 (the same code)? -What version of OL provider are you using?

+

*Thread Reply:* Yep, it reads json and writes out as parquet

@@ -179314,23 +181266,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:20:22
+
2024-03-07 13:18:27
-

*Thread Reply:* I've been using one toy DAG and have only tried with the two operators mentioned. Currently, my team's code doesn't use provider operators so it would not really work well with OL.

- -

Yes, it worked with Airflow 2.6.1. Same code.

- -

Right now, I'm using apache-airflow-providers-openlineage==1.5.0 and the other OL dependencies are at 1.9.1.

+

*Thread Reply:* We're with EMR so I created an AWS support ticket to ask whether this is a known issue with YARN/Spark on Docker

@@ -179344,19 +181292,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-26 15:21:00
+
2024-03-07 13:19:53
-

*Thread Reply:* Would you want to share the SQL statement?

+

*Thread Reply:* Very interesting, would be great to see if we see more data in the metrics in the next release

@@ -179370,30 +181318,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:31:42
+
2024-03-07 13:21:17
-

*Thread Reply:* It has some PII in it, but it's basically in the form of: -```DROP TABLE IF EXISTS usersmeral.keyrelations;

- -

CREATE TABLE usersmeral.keyrelations AS

- -

WITH -staff AS ( SELECT ...) -,enabled AS (SELECT ...) -SELECT ... -FROM public.borrowers -LEFT JOIN ...;``` -We're splitting the query with sqlparse.split() and feed it to a PostgresOperator.

+

*Thread Reply:* For sure, if its on master or if you have a patch I can build the jar and run my job with it if that'd be helpful

@@ -179407,31 +181344,246 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-27 09:26:41
+
2024-03-07 13:22:04
-

*Thread Reply:* I thought I should share our configs in case I'm missing something: -```[openlineage] -disabled = False -disabledforoperators =

+

*Thread Reply:* Not yet 😶

+ + + +
+ 🙏 Max Zheng +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 20:20:14
+
+

*Thread Reply:* After even more investigation I think I found the cause. In https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java a SparkListenerSQLExecutionEnd event is processed after the SparkSession is stopped - I believe createSparkSQLExecutionContext is doing something weird in https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]n/java/io/openlineage/spark/agent/lifecycle/ContextFactory.java at +SparkSession sparkSession = queryExecution.sparkSession(); +I'm not sure if this is defined behavior for the session to be accessed after its stopped? After I skipped the event in onOtherEvent if the session is stopped it no longer crashes trying to spin up new executors

-

namespace =

+

(I can make a Github issue + try to land a patch if you agree this seems like a bug)

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 21:27:14
+
+

*Thread Reply:* (it might affect all events and this is just the first hit)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 05:55:27
+
+

*Thread Reply:* @Max Zheng is the job particularly short lived? We've seen some times when for very short jobs we had the SparkSession stopped (especially if people close it manually) but it never led to any problems like this deadlock.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:20:12
+
+

*Thread Reply:* I don't think job duration is related (also its not a deadlock, its causing the app to crash https://openlineage.slack.com/archives/C01CK9T7HKR/p1709143871823659?thread_ts=1708969888.804979&cid=C01CK9T7HKR) - it failed for ~ 1 hour long job and when testing still failed when I sampled the job input with df.limit(10000). It seems like it happens on jobs where events take a long time to process (like > 20s in the other thread).

-

extractors =

+

I added this block to verify its being processed after the Spark context is stopped and to skip

-

config_path = /opt/airflow/openlineage.yml -transport =

+

```+ private boolean isSparkContextStopped() {

  • return asJavaOptional(SparkSession.getDefaultSession()
  • .map(sparkContextFromSession)
  • .orElse(activeSparkContext))
  • .map(
  • ctx -> {
  • return ctx.isStopped();
  • })
  • .orElse(true); // If for some reason we can't get the Spark context, we assume it's stopped
  • } ++ +@Override +public void onOtherEvent(SparkListenerEvent event) { + if (isDisabled) { + return; + }
  • if (isSparkContextStopped()) {
  • log.warn("SparkContext is stopped, skipping event: {}", event.getClass());
  • return;
  • } +This logs and no longer causes the same app to crash +24/03/12 04:57:14 WARN OpenLineageSparkListener: SparkSession is stopped, skipping event: class org.apache.spark.sql.execution.ui.SparkListenerDriverAccumUpdates```
  • +
+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + + +
@@ -179445,22 +181597,18891 @@

disablesourcecode = ```

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-27 09:27:20
+
2024-03-12 12:29:34
-

*Thread Reply:* The YAML file: -transport: - type: http - url: <http://marquez:5000>

+

*Thread Reply:* might the crash be related to memory issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:29:48
+
+

*Thread Reply:* ah, I see

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:31:30
+
+

*Thread Reply:* another question, are you explicitely stopping the sparksession/sparkcontext from within your job?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:31:47
+
+

*Thread Reply:* Yep, it only happens where we explicitly stop with spark.stop()

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-13 16:18:23
+
+

*Thread Reply:* Created: https://github.com/OpenLineage/OpenLineage/issues/2513

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-26 12:52:47
+
+

Lastly, would disabling facets improve performance? eg. disabling spark.logicalPlan

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-02-27 02:26:44
+
+

*Thread Reply:* Disabling spark.LogicalPlan may improve performance of populating OL event. It's disabled by default in recent version (the one released yesterday). You can also use circuit breaker feature if you are worried about Ol integration affecting Spark jobs

+ + + +
+ 🤩 Yannick Libert +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Yannick Libert + (yannick.libert.partner@decathlon.com) +
+
2024-02-27 05:20:13
+
+

*Thread Reply:* This feature is going to be so useful for us! Love it!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-26 14:23:37
+
+

@channel +We released OpenLineage 1.9.1, featuring: +• Airflow: add support for JobTypeJobFacet properties #2412 @mattiabertorello +• dbt: add support for JobTypeJobFacet properties #2411 @mattiabertorello +• Flink: support Flink Kafka dynamic source and sink #2417 @HuangZhenQiu +• Flink: support multi-topic Kafka Sink #2372 @pawel-big-lebowski +• Flink: support lineage for JDBC connector #2436 @HuangZhenQiu +• Flink: add common config gradle plugin #2461 @HuangZhenQiu +• Java: extend circuit breaker loaded with ServiceLoader #2435 @pawel-big-lebowski +• Spark: integration now emits intermediate, application level events wrapping entire job execution #2371 @mobuchowski +• Spark: support built-in lineage within DataSourceV2Relation #2394 @pawel-big-lebowski +• Spark: add support for JobTypeJobFacet properties #2410 @mattiabertorello +• Spark: stop sending spark.LogicalPlan facet by default #2433 @pawel-big-lebowski +• Spark/Flink/Java: circuit breaker #2407 @pawel-big-lebowski +• Spark: add the capability to publish Scala 2.12 and 2.13 variants of openlineage-spark #2446 @d-m-h +A large number of changes and bug fixes were also included. +Thanks to all our contributors with a special shout-out to @Damien Hawes, who contributed >10 PRs to this release! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.9.1 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.8.0...1.9.1 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🚀 Jakub Dardziński, Jackson Goerner, Abdallah, Yannick Libert, Mattia Bertorello, Tristan GUEZENNEC -CROIX-, Fabio Manganiello, Maciej Obuchowski +
+ +
+ 🎉 Abdallah, Mattia Bertorello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 14:33:27
+
+

*Thread Reply:* Oudstanding work @Damien Hawes 👏

+ + + +
+ ➕ Michael Robinson, Mattia Bertorello, Fabio Manganiello, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-02-27 00:39:29
+
+

*Thread Reply:* Thank you 👏👏

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
ldacey + (lance.dacey2@sutherlandglobal.com) +
+
2024-02-27 11:02:19
+
+

*Thread Reply:* any idea how OL releases tie into the airflow provider?

+ +

I assume that a separate apache-airflow-providers-airflow release would be made in the future to incorporate the new features/fixes?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-27 11:05:55
+
+

*Thread Reply:* yes, Airflow providers are released on behalf of Airflow community and different than Airflow core release

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:24:57
+
+

*Thread Reply:* It seems like OpenLineage Spark is still on 1.8.0? Any idea when this will be updated? Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-02-27 15:29:28
+
+

*Thread Reply:* @Max Zheng https://openlineage.io/docs/integrations/spark/#how-to-use-the-integration

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:30:14
+
+

*Thread Reply:* Oh got it, didn't see the note +The above necessitates a change in the artifact identifier for io.openlineage:openlineage-spark. After version 1.8.0, the artifact identifier has been updated. For subsequent versions, utilize: io.openlineage:openlineage_spark_${SCALA_BINARY_VERSION}:${OPENLINEAGE_SPARK_VERSION}.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:30:18
+
+

*Thread Reply:* Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-02-27 15:30:36
+
+

*Thread Reply:* You're welcome.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:04:33
+
+

Hi all, I'm working on a local Airflow-OpenLineage-Marquez integration using Airflow 2.7.3 and python 3.10. Everything seems to be installed correctly with the appropriate settings. I'm seeing events, jobs, tasks trickle into the UI. I'm using the PostgresOperator. When it's time for the SQL code to be parsed, I'm seeing the following in my Airflow logs: +[2024-02-26, 19:43:17 UTC] {sql.py:457} INFO - Running statement: SELECT CURRENT_SCHEMA;, parameters: None +[2024-02-26, 19:43:17 UTC] {base.py:152} WARNING - OpenLineage provider method failed to extract data from provider. +[2024-02-26, 19:43:17 UTC] {manager.py:198} WARNING - Extractor returns non-valid metadata: None +Can anyone give me pointers on why exactly this might be happening? I've tried also with the SQLExecuteQueryOperator, same result. I previously got a Marquez setup to work with the external OpenLineage package for Airflow with Airflow 2.6.1. But I'm struggling with this newer integrated OpenLineage version

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 15:10:21
+
+

*Thread Reply:* Does this happen for some particular SQL but works for other? +Also, my understanding is that it worked with openlineage-airflow on Airflow 2.6.1 (the same code)? +What version of OL provider are you using?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:20:22
+
+

*Thread Reply:* I've been using one toy DAG and have only tried with the two operators mentioned. Currently, my team's code doesn't use provider operators so it would not really work well with OL.

+ +

Yes, it worked with Airflow 2.6.1. Same code.

+ +

Right now, I'm using apache-airflow-providers-openlineage==1.5.0 and the other OL dependencies are at 1.9.1.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 15:21:00
+
+

*Thread Reply:* Would you want to share the SQL statement?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:31:42
+
+

*Thread Reply:* It has some PII in it, but it's basically in the form of: +```DROP TABLE IF EXISTS usersmeral.keyrelations;

+ +

CREATE TABLE usersmeral.keyrelations AS

+ +

WITH +staff AS ( SELECT ...) +,enabled AS (SELECT ...) +SELECT ... +FROM public.borrowers +LEFT JOIN ...;``` +We're splitting the query with sqlparse.split() and feed it to a PostgresOperator.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-27 09:26:41
+
+

*Thread Reply:* I thought I should share our configs in case I'm missing something: +```[openlineage] +disabled = False +disabledforoperators =

+ +

namespace =

+ +

extractors =

+ +

config_path = /opt/airflow/openlineage.yml +transport =

+ +

disablesourcecode = ```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-27 09:27:20
+
+

*Thread Reply:* The YAML file: +transport: + type: http + url: <http://marquez:5000>

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-04 13:01:19
+
+

*Thread Reply:* Are you running on apple silicon?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-03-04 15:39:05
+
+

*Thread Reply:* Yep, is that the issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 13:00:00
+
+

@channel +Since lineage will be the focus of a panel at Data Council Austin next month, it seems like a great opportunity to organize a meetup. Please get in touch if you might be interested in attending, presenting or hosting!

+
+
datacouncil.ai
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ✅ Sheeri Cabral (Collibra), Jarek Potiuk, Howard Yoo +
+ +
+ ❤️ Harel Shein, Julian LaNeve, Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-02-28 14:37:16
+
+

Hi all, I'm running into an unusual issue with OpenLineage on Databricks. When using OL 1.4.1 on a cluster that runs over 100 jobs every 30 minutes. After a couple hours, a DRIVER_NOT_RESPONDING error starts showing up in the event log with the message Driver is up but is not responsive, likely due to GC.. After a DRIVER_HEALTHY the error occurs again several minutes later. Is this a known issue that has been solved in a later release, or is there something I can do in Databricks to stop this?

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:27:20
+
+

*Thread Reply:* My guess would be that with that amount of jobs scheduled shortly the SparkListener queue grows and some internal healthcheck times out?

+ +

Maybe you could try disabling spark.logicalPlan and spark_unknown facets to see if this speeds things up.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 09:42:27
+
+

*Thread Reply:* BTW, are you receiving OL events in the meantime?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 12:55:50
+
+

*Thread Reply:* Hi @Declan Grant, can you tell us if disabling the facets worked?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-03-04 14:30:14
+
+

*Thread Reply:* We had already tried disabling the facets, and that did not solve the issue.

+ +

Here is the relevant spark config: +spark.openlineage.transport.type console +spark.openlineage.facets.disabled [spark_unknown;spark.logicalPlan;schema;columnLineage;dataSource] +We are not interested in column lineage at this time.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-03-04 14:31:28
+
+

*Thread Reply:* OL has been uninstalled from the cluster, so I can't immediately say whether events are received while the driver is not responding.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 15:19:51
+
+

@channel +This month's issue of OpenLineage News is in inboxes now! Sign up to ensure you always get the latest issue. In this edition: a rundown of open issues, new docs and new videos, plus updates on the Airflow Provider, Spark integration and Flink integration (+ more).

+
+
openlineage.us14.list-manage.com
+ + + + + + + + + + + + + + + +
+ + + +
+ 👍 Mattia Bertorello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Simran Suri + (mailsimransuri@gmail.com) +
+
2024-03-01 01:19:04
+
+

Hi all, I've been trying to gather clues on how OpenLineage fetches our inputs' namespace and name from our Spark codebase. Routing to the exact logic would be very helpful for one of my usecase.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-01 02:25:10
+
+

*Thread Reply:* There is no single place where the namespace is assigned to dataset as this is strictly dependending on what datasets are read. Spark, as other OpenLineage integrations, follows the naming convention -> https://openlineage.io/docs/spec/naming

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-01 04:42:12
+
+

Hi all, I'm working on propagating the parent facet from an Airflow DAG to the dbt workflows it launches, and I'm a bit puzzled by the current logic in lineageparentid. It generates an ID in the form namespace/name/run_id (which is the format that dbt-ol expects as well), but here name is actually a UUID generated from the job's metadata, and run_id is the internal Airflow task instance name (usually a concatenation of execution date + try number) instead of a UUID, like OpenLineage advises.

+ +

Instead of using this function I've made my own where name=<dag_id>.<task_id> (as this is the job name propagated in other OpenLineage events as well), and run_id = lineage_run_id(operator, task_instance) - basically using the UUID hashing logic for the run_id that is currently used for the name instead. This seems to be more OpenLineage-compliant and it allows us to link things properly.

+ +

Is there some reason that I'm missing behind the current logic? Things are even more confusing IMHO because there's also a newlineagerun_id utility that calculates the run_id simply as a random UUID, without the UUID serialization logic of lineage_run_id, so it's not clear which one I'm supposed to use.

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-01 05:52:28
+
+

*Thread Reply:* FYI the function I've come up with to link things properly looks like this:

+ +

```from airflow.models import BaseOperator, TaskInstance +from openlineage.airflow.macros import JOBNAMESPACE +from openlineage.airflow.plugin import lineagerunid

+ +

def lineageparentid(self: BaseOperator, taskinstance: TaskInstance) -> str: + return "/".join( + [ + _JOBNAMESPACE, + f"{taskinstance.dagid}.{taskinstance.taskid}", + lineagerunid(self, task_instance), + ] + )```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-04 04:19:39
+
+

*Thread Reply:* @Paweł Leszczyński @Jakub Dardziński - any thoughts here?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-04 05:12:15
+
+

*Thread Reply:* newlineagerun_id is some very old util method that should be deleted imho

+ +

I agree what you propose is more OL-compliant. Indeed, what we have in Airflow provider for dbt cloud integration is pretty the same you have: +https://github.com/apache/airflow/blob/main/airflow/providers/dbt/cloud/utils/openlineage.py#L132

+ +

the reason for that is I think that the logic was a subject of change over time and dbt-ol script just was not updated properly

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👍 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 12:53:44
+
+

*Thread Reply:* @Fabio Manganiello would you mind opening an issue about this on GitHub?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-04 12:54:14
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2488 +there is one already 🙂 @Fabio Manganiello thank you for that!

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 13:05:13
+
+

*Thread Reply:* Oops, should have checked first! Yes, thanks Fabio

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-04 13:19:50
+
+

*Thread Reply:* There is also a PR already, sent as separate message by @Fabio Manganiello. And the same fix for the provider here. Some discussion is needed about what changes can we made to the macros and whether they will be "breaking", so feel free to comment.

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Honey Thakuria + (Honey_Thakuria@intuit.com) +
+
2024-03-01 07:49:55
+
+

Hey team, +we're trying to extract certain Spark metrics with OL using custom Facets.

+ +

But we're not getting SparkListenerTaskStart , SparkListenerTaskEnd event as part of custom facet.

+ +

We're only able to get SparkListenerJobStart, SparkListenerJobEnd, SparkListenerSQLExecutionStart, SparkListenerSQLExecutionEnd.

+ +

This is how our custom facet code looks like : +``` @Override + protected void build(SparkListenerEvent event, BiConsumer<String, ? super TestRunFacet> consumer) { + if (event instanceof SparkListenerSQLExecutionStart) { ...} +if (event instanceof SparkListenerTaskStart) { ...}

+ +

} +But when we're executing the same Spark SQL using custom listener without OL facets, we're able to get Task level metrics too: +public class IntuitSparkMetricsListener extends SparkListener { + @Override + public void onJobStart(SparkListenerJobStart jobStart){ + log.info("job start logging starts"); + log.info(jobStart.toString());

+ +
}
+
+
+@Override
+public void onTaskEnd(SparkListenerTaskEnd taskEnd) {
+
+ +

} +.... +}``` +Could anyone give us certain input on how to get Task level metrics in OL facet itself ? +Also, any issue due to SparkListenerEvent vs SparkListener ?

+ +

cc @Athitya Kumar @Kiran Hiremath

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-01 08:00:09
+
+

*Thread Reply:* OpenLineageSparkListener is not listening on SparkListenerTaskStart at all. It listens to SparkListenerTaskEnd , but only to fill metrics for OutputStatisticsOutputDatasetFacet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-01 08:03:05
+
+

*Thread Reply:* I think to do this would be a not that small change, you'd need to add handling for those methods for ExecutionContexts https://github.com/OpenLineage/OpenLineage/blob/31f8ce588526e9c7c4bc7d849699cb7ce2[…]java/io/openlineage/spark/agent/lifecycle/ExecutionContext.java and OpenLineageSparkListener itself to pass it forward.

+ +

When it comes to implementation of them in particular contexts, I would make sure they don't emit unless you have something concrete set up for them, like those metrics you've set up.

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-04 06:57:09
+
+

Hi folks, I have created a PR to address the required changes in the Airflow lineage_parent_id macro, as discussed in my previous comment (cc @Jakub Dardziński @Damien Hawes @Mattia Bertorello)

+
+ + +
+ + + } + + Fabio Manganiello + (https://openlineage.slack.com/team/U06BV4F12JU) +
+ + + + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Labels
+ integration/airflow +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ 👀 Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-13 14:10:46
+
+

*Thread Reply:* Hey Fabio, thanks for the PR. Please let us know if you need any help with fixing tests.

+ + + +
+ 🙌 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-06 15:22:46
+
+

@channel +This month’s TSC meeting is next week on a new day/time: Wednesday the 13th at 9:30am PT. Please note that this will be the new day/time going forward! +On the tentative agenda: +• announcements + ◦ new integrations: DataHub and OpenMetadata + ◦ upcoming events +• recent release 1.9.1 highlights +• Scala 2.13 support in Spark overview by @Damien Hawes +• Circuit breaker in Spark & Flink @Paweł Leszczyński +• discussion items +• open discussion +More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? Reply here or DM me to be added to the agenda.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+ 🙏 Willy Lulciuc +
+ +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-06 19:45:11
+
+

Hi, would it be reasonable to add a flag to skip RUNNING events for the Spark integration? https://openlineage.io/docs/integrations/spark/job-hierarchy For some jobs we're seeing AsyncEventQueue report ~20s to process each event and a lot of RUNNING events being generated

+ +

IMO this might work as an alternative to https://github.com/OpenLineage/OpenLineage/issues/2375 ? It seems like it'd be more valuable to get the START/COMPLETE events vs intermediate RUNNING events

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Labels
+ proposal +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:13:16
+
+

*Thread Reply:* Well, I think the real problem is 20s event generator. What we should do is to include timer spent on each visitor or dataset builder within debug facet. Once this is done, we could reach out to you again to let you guide us which code part leads to such scenario.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:13:44
+
+

*Thread Reply:* @Maciej Obuchowski do we have an issue for this? I think we discussed it recently.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 11:58:05
+
+

*Thread Reply:* > What we should do is to include timer spent on each visitor or dataset builder within debug facet. +I could help provide this data if that'd be helpful, how/what instrumentation should I add? If you've got a patch handy I could apply it locally, build, and collect this data from my test job

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 12:15:42
+
+

*Thread Reply:* Its also taking > 20s per event with parquet writes instead of hudi writes in my job so I don't think thats the culprit

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 14:45:59
+
+

*Thread Reply:* I'm working on instrumentation/metrics right now, will be ready for next release 🙂

+ + + +
+ 🙌 Max Zheng, Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 20:04:22
+
+

*Thread Reply:* I did some manual timing and 90% of the latency is from buildInputDatasets https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]enlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java

+ +

Manual as in I modified: +long startTime = System.nanoTime(); + List&lt;InputDataset&gt; datasets = + Stream.concat( + buildDatasets(nodes, inputDatasetBuilders), + openLineageContext + .getQueryExecution() + .map( + qe -&gt; + ScalaConversionUtils.fromSeq(qe.optimizedPlan().map(inputVisitor)) + .stream() + .flatMap(Collection::stream) + .map(((Class&lt;InputDataset&gt;) InputDataset.class)::cast)) + .orElse(Stream.empty())) + .collect(Collectors.toList()); + long endTime = System.nanoTime(); + double durationInSec = (endTime - startTime) / 1_000_000_000.0; + <a href="http://log.info">log.info</a>("buildInputDatasets 1: {}s", durationInSec); +24/03/11 23:44:58 INFO OpenLineageRunEventBuilder: buildInputDatasets 1: 95.710143007s +Is there anything I can instrument/log to narrow down further why this is so slow? buildOutputDatasets is also kind of slow at ~10s

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 05:57:58
+
+

*Thread Reply:* @Max Zheng it's not extremely easy because sometimes QueryPlanVisitors/DatasetBuilders delegate work to other ones, but I think I'll have a relatively good solution soon: https://github.com/OpenLineage/OpenLineage/pull/2496

+ + + +
+ 👍 Paweł Leszczyński, Max Zheng +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:20:24
+
+

*Thread Reply:* Got it, should I open a Github issue to track this?

+ +

For context the code is +def load_df_with_schema(spark: SparkSession, s3_base: str): + schema = load_schema(spark, s3_base) + file_paths = get_file_paths(spark, "/".join([s3_base, "manifest.json"])) + return spark.read.format("json").load( + file_paths, + schema=schema, + mode="FAILFAST", + ) +And the input schema has ~250 columns

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:24:00
+
+

*Thread Reply:* the instrumentation issues are already there, but please do open issue for the slowness 👍

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:24:34
+
+

*Thread Reply:* and yes, it can be some degenerated example where we do something way more often than once

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:25:19
+
+

*Thread Reply:* Got it, I'll try to create a working reproduction and ticket it 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-13 16:18:31
+
+

*Thread Reply:* Created https://github.com/OpenLineage/OpenLineage/issues/2511

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 02:02:43
+
+

Hi team... I am trying to emit openlineage events from a spark job. When I submit the job using spark-submit, this is what I see in console.

+ +

ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception +io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException: Failed to find TransportBuilder (through reference chain: io.openlineage.client.OpenLineageYaml["transport"]) + at io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(OpenLineageClientUtils.java:149) + at io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114) + at io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78) + at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277) + at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:110) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +Caused by: io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException: Failed to find TransportBuilder (through reference chain: io.openlineage.client.OpenLineageYaml["transport"]) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1853) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:316) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3809) + at io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(OpenLineageClientUtils.java:147) + ... 18 more +Caused by: java.lang.IllegalArgumentException: Failed to find TransportBuilder + at io.openlineage.client.transports.TransportResolver.lambda$getTransportBuilder$3(TransportResolver.java:38) + at java.base/java.util.Optional.orElseThrow(Optional.java:403) + at io.openlineage.client.transports.TransportResolver.getTransportBuilder(TransportResolver.java:37) + at io.openlineage.client.transports.TransportResolver.resolveTransportConfigByType(TransportResolver.java:16) + at io.openlineage.client.transports.TransportConfigTypeIdResolver.typeFromId(TransportConfigTypeIdResolver.java:35) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:159) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:151) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:147) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314) + ... 23 more +Can I get any help on this?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 02:20:32
+
+

*Thread Reply:* Looks like misconfigured transport. Please refer to this -> https://openlineage.io/docs/integrations/spark/configuration/transport and https://openlineage.io/docs/integrations/spark/configuration/spark_conf for more details. I think you're missing spark.openlineage.transport.type property.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 02:28:10
+
+

*Thread Reply:* This is my configuration of the transport: +conf.set("sparkscalaversion", "2.12") + conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") + conf.set("spark.openlineage.transport.type","http") + conf.set("spark.openlineage.transport.url","<http://localhost:8082>") + conf.set("spark.openlineage.transport.endpoint","/event") + conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") +During spark-submit if I include +--packages "io.openlineage:openlineage_spark:1.8.0" +I am able to receive events.

+ +

I have already included this line in build.sbt +libraryDependencies += "io.openlineage" % "openlineage-spark" % "1.8.0"

+ +

So I don't understand why I have to pass the packages again

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:07:04
+
+

*Thread Reply:* OK, the configuration is OK. I think that when using libraryDependencies you get rid of manifest from within our JAR which is used by ServiceLoader

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:07:40
+
+

*Thread Reply:* this is happening here -> https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/transports/TransportResolver.java#L32

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:08:37
+
+

*Thread Reply:* And this is the known issue related to this -> https://github.com/OpenLineage/OpenLineage/issues/1860

+
+ + + + + + + +
+
Assignees
+ <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> +
+ +
+
Labels
+ bug, integration/spark +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:09:47
+
+

*Thread Reply:* This comment -> https://github.com/OpenLineage/OpenLineage/issues/1860#issuecomment-1750536744 explains this and shows how to fix this. I am happy to help new contributors with this.

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 03:10:57
+
+

*Thread Reply:* Thanks for the detailed reply and pointers. Will look into it.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 15:56:52
+
+

@channel +The big redesign of Marquez Web is out now following a productive testing period and some modifications along with added features. In addition to a wholesale redesign including column lineage support, it includes a new dataset tagging feature. It's worth checking out as a consumption layer in your lineage solution. A blog post with more details is coming soon, but here are some screenshots to whet your appetite. (See the thread for a screencap of the column lineage display.) +Marquez quickstart: https://marquezproject.ai/docs/quickstart/ +The release itself: https://github.com/MarquezProject/marquez/releases/tag/0.45.0

+ + + + + + +
+ 🤯 Ross Turk, Julien Le Dem, Harel Shein, Juan Luis Cano Rodríguez, Paweł Leszczyński, Mattia Bertorello, Rodrigo Maia +
+ +
+ ❤️ Harel Shein, Peter Huang, Kengo Seki, Paul Wilson Villena, Paweł Leszczyński, Mattia Bertorello, alexandre bergere, Rodrigo Maia, Maciej Obuchowski, Ernie Ostic, Dongjin Seo +
+ +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Cory Visi + (cvisi@amazon.com) +
+
2024-03-07 17:34:18
+
+

*Thread Reply:* Are those field descriptions coming from emitted events? or from a defined schema that's being added by marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ted McFadden + (tmcfadden@consoleconnect.com) +
+
2024-03-07 17:51:42
+
+

*Thread Reply:* Nice work! Are there any examples of the mode being switched from Table level to Column level or do I miss understand what mode is?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:52:11
+
+

*Thread Reply:* @Cory Visi Those are coming from the events. The screenshots are of the UI seeded with metadata. You can find the JSON used for this here: https://github.com/MarquezProject/marquez/blob/main/docker/metadata.json

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:53:38
+
+

*Thread Reply:* The three screencaps in my first message actually don't include the column lineage display feature (but there are lots of other upgrades in the release)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:55:56
+
+

*Thread Reply:* column lineage view:

+ + + + +
+ ❤️ Paweł Leszczyński, Rodrigo Maia, Cory Visi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ted McFadden + (tmcfadden@consoleconnect.com) +
+
2024-03-07 18:01:21
+
+

*Thread Reply:* Thanks, that's what I wanted to get a look at. Cheers

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 18:01:25
+
+

*Thread Reply:* @Ted McFadden what the initial 3 screencaps show is switching between the graph view and detailed views of the datasets and jobs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
David Sharp + (davidsharp7@gmail.com) +
+
2024-03-07 23:59:42
+
+

*Thread Reply:* Hey with the tagging we’ve identified a slight bug - PR has been put into fix.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-08 05:31:15
+
+

*Thread Reply:* The "query" section looks awesome, Congrats!!! But from the openlineage side, when is the query attribute available?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Cory Visi + (cvisi@amazon.com) +
+
2024-03-08 07:36:29
+
+

*Thread Reply:* Fantastic work!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-08 07:55:30
+
+

*Thread Reply:* @Rodrigo Maia the OpenLineage spec supports this via the SQLJobFacet. See: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/SQLJobFacet.json

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 08:42:40
+
+

*Thread Reply:* Thanks Michael....do we have a list of which providers are known to be populating the SQL JobFacet (assuming that the solution emitting the events uses SQL and has access to it)?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-08 08:59:24
+
+

*Thread Reply:* @Maciej Obuchowski or @Jakub Dardziński can add more detail, but this doc has a list of operators supported by the SQL parser.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:01:13
+
+

*Thread Reply:* yeah, so basically any of the operators that is sql-compatible - SQLExecuteQueryOperator + Athena, BQ I think

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 09:05:45
+
+

*Thread Reply:* Thanks! That helps for Airflow --- do we know if any other Providers are fully supporting this powerful facet?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:07:45
+
+

*Thread Reply:* whoa, powerful 😅 +I just checked sources, the only missing from above is CopyFromExternalStageToSnowflakeOperator

+ +

are you interested in some specific ones?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:08:49
+
+

*Thread Reply:* and ofc you can have SQLJobFacet coming from dbt or spark as well or any other systems triggered via Airflow

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 11:03:36
+
+

*Thread Reply:* Thanks Jakub. It will be interesting to know which providers we are certain provide SQL, that are entirely independent of Airflow.

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 11:07:50
+
+

*Thread Reply:* I don’t think we have any facet-oriented docs (e.g. what produces SQLJobFacet) and if that makes sense

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 11:14:40
+
+

*Thread Reply:* Thanks. Ultimately, it's a bigger question that we've talked about before, about best ways to document and validate what things/facets you can support/consume (as a consumer) or which you support/populate as a provider.

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 11:16:05
+
+

*Thread Reply:* The doc that @Michael Robinson shared is automatically generated from Airflow code, so it should provide the best option for build-in operators. If we're talking about providers/operators outside Airflow repo, then I think @Julien Le Dem’s registry proposal would best support that need

+ + + +
+ ☝️ Jakub Dardziński, Ernie Ostic +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Athitya Kumar + (athityakumar@gmail.com) +
+
2024-03-07 23:44:08
+
+

Hey team. Is column/attribute level lineage supported for input/topic Kafka topic ports in the OpenLineage Flink listener?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 02:07:58
+
+

*Thread Reply:* Column level lineage is currently not supported for Flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-08 04:57:20
+
+

Is it possible to explain me "OTHER" Run State and whether we can use this to send Lineage events to check the health of a service that is running in background and triggered interval manner. +It will be really helpful, if someone can send example JSON for "OTHER" run state

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 05:17:19
+
+

*Thread Reply:* The example idea behind other was: imagine a system that requests for compute resorouces and would like to emit OpenLineage event about request being made. That's why other can occur before start. The other idea was to put other elsewhere to provide agility for new scenarios. However, we want to restrict which event types are terminating ones and don't want other there. This is important for lineage consumers, as when they receive terminating event for a given run, they know all the events related to the run were emitted.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-08 05:38:21
+
+

*Thread Reply:* @Paweł Leszczyński Is it possible to track the health of a service by using OpenLineage Events? Of so, How? +As an example, I have a windows service, and I want to make sure the service is up and running.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 05:53:58
+
+

*Thread Reply:* depends on what do you mean by service. If you consider a data processing job as a service, then you can track if it successfully completes.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:08:46
+
+

*Thread Reply:* I think other systems would be more suited for healthchecks, like OpenTelemetry or Datadog

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:22:03
+
+

hey there, trying to configure databricks spark with the openlineage spark listener 🧵

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:22:52
+
+

*Thread Reply:* databricks runtime for clusters: +14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) +we are shipping a global init script that looks like the following: +```#!/bin/bash +VERSION="1.9.1" +SCALAVERSION="2.12" +wget -O /mnt/driver-daemon/jars/openlineage-spark$${SCALAVERSION}-$${VERSION}.jar https://repo1.maven.org/maven2/io/openlineage/openlineage-spark$${SCALAVERSION}/$${VERSION}/openlineage-spark$${SCALA_VERSION}-$${VERSION}.jar

+ +

SPARKDEFAULTSFILE="/databricks/driver/conf/00-openlineage-defaults.conf"

+ +

if [[ $DBISDRIVER = "TRUE" ]]; then + cat > $SPARKDEFAULTSFILE <<- EOF + [driver] { + "spark.extraListeners" = "com.databricks.backend.daemon.driver.DBCEventLoggingListener,io.openlineage.spark.agent.OpenLineageSparkListener" + "spark.openlineage.version" = "v1" + "spark.openlineage.transport.type" = "http" + "spark.openlineage.transport.url" = "https://some.url" + "spark.openlineage.dataset.removePath.pattern" = "(\/[a-z]+[-a-zA-Z0-9]+)+(?<remove>.**)" + "spark.openlineage.namespace" = "some_namespace" + } +EOF +fi``` +with openlineage-spark 1.9.1

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:23:38
+
+

*Thread Reply:* getting fatal exceptions: +24/03/07 14:14:05 ERROR DatabricksMain$DBUncaughtExceptionHandler: Uncaught exception in thread spark-listener-group-shared! +java.lang.NoClassDefFoundError: com/databricks/sdk/scala/dbutils/DbfsUtils + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.getDbfsUtils(DatabricksEnvironmentFacetBuilder.java:124) + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.getDatabricksEnvironmentalAttributes(DatabricksEnvironmentFacetBuilder.java:92) + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.build(DatabricksEnvironmentFacetBuilder.java:58) +and spark driver crashing when spark runs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:28:43
+
+

*Thread Reply:* browsing the code for 1.9.1 shows that the exception comes from trying to access the class for databricks dbfsutils here

+ +

should I file a bug on github, or am I doing something very wrong here?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:53:00
+
+

*Thread Reply:* Looks like something has changed in the Databricks 14 🤔

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:53:17
+
+

*Thread Reply:* Issue on GitHub is the right way

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:53:49
+
+

*Thread Reply:* thanks, opening one now with this information.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 09:21:24
+
+

*Thread Reply:* link to issue for anyone interested, thanks again!

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-15 10:09:00
+
+

*Thread Reply:* Hi @Maciej Obuchowski I am having the same issue with older versions of Databricks.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 02:47:30
+
+

*Thread Reply:* I don't think that the spark's integration is working anymore for any of the environments in Databricks and not only the version 14.

+ + + +
+ ➕ Tristan GUEZENNEC -CROIX- +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 05:38:05
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-18 07:14:09
+
+

*Thread Reply:* @Abdallah are you willing to provide PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 11:51:20
+
+

*Thread Reply:* I am having a look

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-20 04:45:02
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2530

+
+ + + + + + + +
+
Labels
+ integration/spark +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
slackbot + +
+
2024-03-08 12:04:26
+
+

This message was deleted.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:13:32
+
+

*Thread Reply:* is what you sent an event for DAG or task?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:22:32
+
+

*Thread Reply:* so far Marquez cannot show job hierarchy (DAG is parent to tasks) so you need click on some of the tasks in the UI to see proper view

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:33:25
+
+

*Thread Reply:* is this the only job listed?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:33:37
+
+

*Thread Reply:* no, I can see 191 total

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:34:22
+
+

*Thread Reply:* what if you choose any other job that has ACustomingestionDag. prefix?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:39:24
+
+

*Thread Reply:* you also have namespaces in right upper corner. datasets are probably in different namespace than Airflow jobs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:47:52
+
+

*Thread Reply:* https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/supported_classes.html

+ +

this is the list of supported operators currently

+ +

not all of them send dataset information, e.g. PythonOperator

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 14:06:35
+
+

hi everyone!

+ +

i configured openlineage + marquez to my Amazon managed Apache Airflow to get better insights of the DAGS. for implementation i followed the https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/ guide, using helm/k8s option. +marquez is up and running i can see my DAGs and depending DAGs in jobs section, however when clicking on any of the dags in jobs list i see only one job without any dependencies. i would like to see the whole chain of tasks execution. how can i achieve this goal? please advice.

+ +

additional information: +we dont have Datasets in our MWAA. +MWAA Airflow - v. 2.7.2 +Openlineage plugin.py - +from airflow.plugins_manager import AirflowPlugin +from airflow.models import Variable +import os

+ +

os.environ["OPENLINEAGEURL"] = Variable.get('OPENLINEAGEURL', default_var='')

+ +

class EnvVarPlugin(AirflowPlugin): + name = "envvarplugin"

+ +

requirements.txt: +httplib2 +urllib3 +oauth2client +bingads +pymssql +certifi +facebook_business +mysql-connector-python +google-api-core +google-auth +google-api-python-client +apiclient +google-auth-httplib2 +google-auth-oauthlib +pymongo +pandas +numpy +pyarrow +apache-airflow-providers-openlineage

+ +

Also, where can i find the meaning of Depth, complete mode, compact nodes options? i believe it is an view option?

+ +

Thank you in advance for your help!

+ +
+ + + + + + + + + +
+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-08 14:17:50
+
+

*Thread Reply:* Jobs may not have any dependencies depending on the Airflow operator used (ex: PythonOperator). Can you provide the OL events for the job you expect to have inputs/outputs? In the Marquez Web UI, you can use the events tab:

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 14:42:14
+
+

*Thread Reply:* i expect to see dependencies from all my jobs. i was hoping marquez will show similar view as airflow does, and therefore having easier chance to troubleshoot failed DAGs. please refer to the image below.

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 17:02:09
+
+

*Thread Reply:* is this what you requested?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 10:19:28
+
+

*Thread Reply:* hello! @Willy Lulciuc could you please guide me further? what can be done to see the whole chain of DAG execution in openlineage/marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 14:42:01
+
+

*Thread Reply:* from textwrap import dedent +import mysql.connector +import pymongo +import logging +import sys +import ast +from airflow import DAG +from airflow.operators.python import PythonOperator +from airflow.operators.trigger_dagrun import TriggerDagRunOperator +from airflow.operators.python import BranchPythonOperator +from airflow.providers.http.operators.http import SimpleHttpOperator +from airflow.models import Variable +from bson.objectid import ObjectId +we do use PythonOperator, however we are specifying task dependencies in the DAG code, example:

+ +

error_task = PythonOperator( +891 task_id='error', +892 python_callable=error, +893 dag=dag, +894 trigger_rule = "one_failed" +895 ) +896 +897 transformed_task >> generate_dict >> api_trigger_dependent_dag >> error_task +for this case is there a way to have detailed view in Marquez Web UI?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 14:50:17
+
+

*Thread Reply:* @Jakub Berezowski hello! could you please take a look at my case and advice what can be done whenever you have time? thank you!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suresh Kumar + (ssureshkumar6@gmail.com) +
+
2024-03-10 04:35:02
+
+

Hi All, +I'm based out of Sydney and we are using the open lineage on Azure data platform. +I'm looking for some direction and support where we got struck currently on lineage creation from Spark (Azure Synapse Analytics) +PySpark not able to emit lineage when there are some complex transformations happening. +The open lineage version we currently using is v0.18 and Spark version is 3.2.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-11 03:54:43
+
+

*Thread Reply:* Hi, could you provide some more details on the issue you are facing? Some debug logs, specific error message, pyspark code that causes the issue? Also, current OpenLineage version is 1.9.1 , is there any reason you are using an outdated 0.18?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suresh Kumar + (ssureshkumar6@gmail.com) +
+
2024-03-11 19:15:18
+
+

*Thread Reply:* Thanks for the headsup. We are in process of upgrading the library and get back to you.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kylychbek Zhumabai uulu + (kylychbekeraliev2000@gmail.com) +
+
2024-03-11 12:51:09
+
+

Hello everyone, is there anyone who integrated AWS MWAA with Openlineage, I'm trying it but it is not working, can you give some ideas and steps if you have an experience for that?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 12:37:47
+
+

@channel +This month's TSC meeting, open to all, is tomorrow at 9:30 PT. The updated agenda includes exciting news of new integrations and presentations by @Damien Hawes and @Paweł Leszczyński. Hope to see you there! https://openlineage.slack.com/archives/C01CK9T7HKR/p1709756566788589

+
+ + +
+ + + } + + Michael Robinson + (https://openlineage.slack.com/team/U02LXF3HUN7) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 🚀 Mattia Bertorello, Maciej Obuchowski, Sheeri Cabral (Collibra), Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-13 10:28:41
+
+

Hi team.. If we are trying to send openlineage events from spark job to kafka endpoint which requires keystore and truststore related properties to be configured, how can we configure it?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-13 10:33:48
+
+

*Thread Reply:* Hey, check out this docs and spark.openlineage.transport.properties.[xxx] configuration. Is this what you are looking for?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-13 11:08:49
+
+

*Thread Reply:* Yes... Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 11:46:09
+
+

Hello all 👋! +Has anyone tried to use spark udfs with openlineage? +Does it make sense for the column-level lineage to stop working in this context?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-15 08:47:54
+
+

*Thread Reply:* did you investigate if it still works on a table-level?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-15 08:49:50
+
+

*Thread Reply:* (I haven’t tried it, but looking at spark UDFs it looks like there are many differences - https://medium.com/@suffyan.asad1/a-deeper-look-into-spark-user-defined-functions-537c6efc5fb3 - nothing is jumping out at me as “this is why it doesn’t work” though.

+
+
Medium
+ + + + + + +
+
Reading time
+ 10 min read +
+ + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-14 03:49:21
+
+

This week brought us many fixes to the Flink integration like: +• #2507 which resolves critical issues introduced in recent release, +• #2508 which makes JDBC dataset naming consistent with dataset naming convention and having a common code for Spark & Flink to extract dataset identifier from JDBC connection url. +• #2512 which includes database schema in dataset identifier for JDBC integration in Flink. +These are significant improvements and I think they should not wait for the next release cycle. +I would like to start a vote for an immediate release.

+ + + +
+ ➕ Kacper Muda, Paweł Leszczyński, Mattia Bertorello, Maciej Obuchowski, Harel Shein, Damien Hawes, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 10:46:42
+
+

*Thread Reply:* Thanks, all. The release is approved..

+ + + +
+ 🙌 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 15:26:58
+
+

*Thread Reply:* Changelog PR is here: https://github.com/OpenLineage/OpenLineage/pull/2516

+
+ + + + + + + +
+
Labels
+ documentation +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-15 11:05:02
+
+

@channel +We released OpenLineage 1.10.2, featuring:

+ +

Additions +• Dagster: add new provider for version 1.6.10 #2518 @JDarDagran +• Flink: support lineage for a hybrid source #2491 @HuangZhenQiu +• Flink: bump Flink JDBC connector version #2472 @HuangZhenQiu +• Java: add a OpenLineageClientUtils#loadOpenLineageJson(InputStream) and change OpenLineageClientUtils#loadOpenLineageYaml(InputStream) methods #2490 @d-m-h +• Java: add info from the HTTP response to the client exception #2486 @davidjgoss +• Python: add support for MSK IAM authentication with a new transport #2478 @mattiabertorello +Removal +• Airflow: remove redundant information from facets #2524 @kacpermuda +Fixes +• Airflow: proceed without rendering templates if task_instance copy fails #2492 @kacpermuda +• Flink: fix class not found issue for Cassandra #2507 @pawel-big-lebowski +• Flink: refine the JDBC table name #2512 @HuangZhenQiu +• Flink: fix JDBC dataset naming #2508 @pawel-big-lebowski +• Flink: fix failure due to missing Cassandra classes #2507 @pawel-big-lebowski +• Flink: fix release runtime dependencies #2504 @HuangZhenQiu +• Spark: fix the HttpTransport timeout #2475 @pawel-big-lebowski +• Spark: prevent NPE if the context is null #2515 @pawel-big-lebowski +• Spec: improve Cassandra lineage metadata #2479 @HuangZhenQiu +Thanks to all the contributors with a shout out to @Maciej Obuchowski for the after-hours CI fix! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.10.2 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.9.1...1.10.2 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🚀 Maciej Obuchowski, Kacper Muda, Mattia Bertorello, Paweł Leszczyński +
+ +
+ 🔥 Maciej Obuchowski, Mattia Bertorello, Paweł Leszczyński, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 08:12:43
+
+

Hi I am new to Openlineage. So can someone help me to understand and how exactly it is setup and how I can setup in my personal laptop and play with it to gain hands on experience

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 08:15:17
+
+

*Thread Reply:* Hey, checkout our Getting Started guide, and the whole documentation on python, java, spark etc. where you will find all the information about the setup and configuration. For Airflow>=2.7, there is a separate documentation

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 08:52:41
+
+

*Thread Reply:* I am getting this error when i am following the commands in my windows laptop: +git clone git@github.com:MarquezProject/marquez.git && cd marquez/docker +running up.sh --seed +marquez-api | WARNING 'MARQUEZCONFIG' not set, using development configuration. +seed-marquez-with-metadata | wait-for-it.sh: waiting 15 seconds for api:5000 +marquez-web | [HPM] Proxy created: /api/v1 -> http://api:5000/ +marquez-web | App listening on port 3000! +marquez-api | INFO [2024-03-18 12:45:01,702] org.eclipse.jetty.util.log: Logging initialized @1991ms to org.eclipse.jetty.util.log.Slf4jLog +marquez-api | INFO [2024-03-18 12:45:01,795] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: / +marquez-api | INFO [2024-03-18 12:45:01,796] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: / +marquez-api | INFO [2024-03-18 12:45:01,797] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: graphql-playground for path /graphql-playground/** +marquez-api | INFO [2024-03-18 12:45:01,807] marquez.MarquezApp: Running startup actions... +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: Flyway Community Edition 8.5.13 by Redgate +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: See what's new here: https://flywaydb.org/documentation/learnmore/releaseNotes#8.5.13 +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: +marquez-db | 2024-03-18 12:45:02.039 GMT [34] FATAL: password authentication failed for user "marquez" +marquez-db | 2024-03-18 12:45:02.039 GMT [34] DETAIL: Role "marquez" does not exist. +marquez-db | Connection matched pghba.conf line 100: "host all all all scram-sha-256" +marquez-api | ERROR [2024-03-18 12:45:02,046] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. +marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez"

+ +

Do I have to do any additional setup to run marquez in local.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 09:02:47
+
+

*Thread Reply:* I don't think OpenLineage and Marquez support windows in any way

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:04:57
+
+

*Thread Reply:* But another way to explore OL and Marquez is with GitPod: https://github.com/MarquezProject/marquez?tab=readme-ov-file#try-it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:05:17
+
+

*Thread Reply:* Also, @GUNJAN YADU have you tried deleting all volumes and starting over?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:10:49
+
+

*Thread Reply:* Volumes as in?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 09:12:21
+
+

*Thread Reply:* Probably docker volumes, you can find them in docker dashboard app:

+ +
+ + + + + + + + + +
+ + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:13:44
+
+

*Thread Reply:* Okay +Its password authentication failure. So do I have to do any kind of posgres setup or environment variable setup

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:24:29
+
+

*Thread Reply:* marquez-db | 2024-03-18 13:19:37.211 GMT [36] FATAL: password authentication failed for user "marquez" +marquez-db | 2024-03-18 13:19:37.211 GMT [36] DETAIL: Role "marquez" does not exist.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 10:11:43
+
+

*Thread Reply:* Setup is successful

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 11:20:43
+
+

*Thread Reply:* @GUNJAN YADU can share what steps you took to make it work?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-19 00:14:17
+
+

*Thread Reply:* First I cleared the volumes +Then did the steps mentioned in link you shared in git bash. +It worked then

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 09:00:19
+
+

*Thread Reply:* Ah, so you used GitPod?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-21 00:35:58
+
+

*Thread Reply:* No +I haven’t. I ran all the commands in git bash

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-19 08:06:07
+
+

Hi everyone !

+ +

I'm beginner to this tool.

+ +

My name is Rohan and facing challenges on Marquez. I have followed the steps as mentioned on website and facing this error. Please check attached picture.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 09:35:16
+
+

*Thread Reply:* Hi Rohan, welcome! There are a number of guides across the OpenLineage and Marquez sites. Would you please share a link to the guide you are using? Also, terminal output as well as version and system information would be helpful. The issue could be a simple config problem or more complicated, but it's impossible to say from the screenshot.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-20 01:47:22
+
+

*Thread Reply:* Hi Michael Robinson,

+ +

Thank you for reverting on this.

+ +

The link I used for installation : https://openlineage.io/getting-started/

+ +

I have attached the terminal output.

+ +

Docker version : 25.0.3, build 4debf41

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-20 01:48:55
+
+

*Thread Reply:* Continuing above thread with a screenshot :

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-20 11:25:36
+
+

*Thread Reply:* Thanks for the details, @Rohan Doijode. Unfortunately, Windows isn't currently supported. To explore OpenLineage+Marquez on Windows we recommend using this pre-configured Marquez Gitpod environment.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-21 00:49:41
+
+

*Thread Reply:* Hi @Michael Robinson,

+ +

Thank you for your input.

+ +

My issues has been resolved.

+ + + +
+ 🎉 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-19 11:37:02
+
+

Hey team! Quick check - has anyone submitted or is planning to submit a CFP for this year's Airflow Summit with an OL talk? Let me know! 🚀

+ + + +
+ ➕ Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 11:40:11
+
+

*Thread Reply:* https://sessionize.com/airflow-summit-2024/

+
+
sessionize.com
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 11:40:22
+
+

*Thread Reply:* the CFP is scheduled to close on April 17

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-19 11:40:59
+
+

*Thread Reply:* Yup. I was thinking about submitting one, but don't want to overlap with someone that already did 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 14:54:06
+
+

Hey Team, We are using MWAA (AWS Managed airflow) which is on version 2.7.2. So we are making use of airflow provided openlineage packages. We have simple test DAG which uses BashOperator and we would like to use manually annotated lineage. So we have provided the inlets and outlets. But when I am run the job. I see the errors - Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.bash.BashExtractor object at 0x7f9446276190> - section/key [openlineage/disabledforoperators]. Do I need to make any configuration changes?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 15:26:53
+
+

*Thread Reply:* hey, there’s a fix for that: https://github.com/apache/airflow/pull/37994 +not released yet.

+ +

Unfortunately, before the release you need to manually set missing entries in configuration

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 16:15:18
+
+

*Thread Reply:* Thanks @Jakub Dardziński . So the temporary fix is to set disabledforoperators for the unsupported operators? If I do that, Do I get my lineage emitted for bashOperator with manually annotated information?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 16:15:59
+
+

*Thread Reply:* I think you should set it for disabled_for_operators, config_path and transport entries (maybe you’ve set some of them already)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 16:23:25
+
+

*Thread Reply:* Ok . Thanks. Yes I did them already.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 22:03:04
+
+

*Thread Reply:* These are my configurations. Its emitting run event only. I have my manually annotated lineage defined for the bashoperator. So when I provide the disabledforoperators, I don't see any errors, But log clearly says "Skipping extraction for operator BashOperator". So I don't see the inlets & outlets info in marquez. If I don't provide disabledforoperators, it fails with error "Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.bash.BashExtractor object at 0x7f9446276190> - section/key [openlineage/disabledforoperators]". So i cannot go either way. Any workaround? or I am making some mistake?

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-20 02:28:53
+
+

*Thread Reply:* Hey @Anand Thamothara Dass, make sure to simply set the config_path , disabled_for_operators and transport to empty strings, unless you actually want to use it (f.e. leave transport as it is if it contains the configuration to the backend). Current issue is that when no variables are found the error is raised, no matter if the actual value is set - they simply need to be in configuration, even as empty string.

+ +

In your setup i seed that you included BashOperator in disabled, so that's why it's ignored.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-20 12:03:55
+
+

*Thread Reply:* Hmm strange. setting to empty strings worked. When I display it in console, I am able to see all the outlets information. But when I transport it to marquez endpoint, I am able to see only run events. No dataset information are captured in Marquez. But when I build the payload myself outside Airflow and push it using postman, I am able to see the dataset information as well in marquez. So I don't know where is the issue. Its airflow or openlineage or marquez 😕

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-20 12:07:07
+
+

*Thread Reply:* Could you share your dag code and task logs for that operator? I think if you use BashOperator and attach inlets and outlets to it, it should work just fine. Also please share the version of Ol package you are using and the name

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-20 14:57:40
+
+

*Thread Reply:* @Kacper Muda - Got that fixed. {"type": "http","url":"<http://10.80.35.62:3000%7Chttp://<ip>:3000>%22,%22endpoint%22:%22api/v1/lineage%22}. Got the end point removed. {"type": "http","url":"<http://10.80.35.62:3000%7Chttp://<ip>:3000>%22}. Kept only till here. It worked. Didn't think that, v1/lineage forces only run events capture. Thanks for all the support !!!

+ + + +
+ 👍 Jakub Dardziński, Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-21 07:44:29
+
+

Hi all,

+ +

We are planning to use OL as Data Lineage Tool.

+ +

We have data in S3 and do use AWS Kinesis. We are looking forward for guidelines to generate graphical representation over Marquez or any other compatible tool.

+ +

This includes lineage on column level and metadata during ETL.

+ +

Thank you in advance

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:06:06
+
+

Hello all, we are struggling with a spark integration with AWS Glue. We have gotten to a configuration that is not causing errors in spark, but it’s not producing any output in the S3 bucket. Can anyone help figure out what’s wrong? (code in thread)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:06:35
+
+

*Thread Reply:* ```import sys +from awsglue.transforms import ** +from awsglue.utils import getResolvedOptions +from pyspark.context import SparkContext +from awsglue.context import GlueContext +from awsglue.job import Job +from pyspark.context import SparkConf +from pyspark.sql import SparkSession

+ +

args = getResolvedOptions(sys.argv, ["JOBNAME"]) +print(f'the job name received is : {args["JOBNAME"]}')

+ +

spark1 = SparkSession.builder.appName("OpenLineageExample").config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener").config("spark.openlineage.transport.type", "file").config("spark.openlineage.transport.location", "").config("spark.openlineage.namespace", "AWSGlue").getOrCreate()

+ +

glueContext = GlueContext(sc)

+ +

Initialize the glue context

+ +

sc = SparkContext(spark1)

+ +

glueContext = GlueContext(spark1) +spark = glueContext.spark_session

+ +

job = Job(glueContext) +job.init(args["JOB_NAME"], args)

+ +

df=spark.read.format("csv").option("header","true").load("s3://<bucket>/input/Master_Extract/") +df.write.format('csv').option('header','true').save(' + + + +

+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:07:05
+
+

*Thread Reply:* cc @Rodrigo Maia since I know you’ve done some AWS glue

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 11:41:39
+
+

*Thread Reply:* Several things:

+ +
  1. s3 isn't a file system. It is an object storage system. Concretely, this means when an object is written, it's immutable. If you want to update the object, you need to read it in its entirety, modify it, and then write it back.
  2. Java probably doesn't know how to handle the s3 protocol.
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 11:41:54
+
+

*Thread Reply:* (As opposed the the file protocol)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:05:15
+
+

*Thread Reply:* OK, so the problem is we’ve set it to config(“spark.openlineage.transport.type”, “file”) +and then give it s3:// instead of a file path…..

+ +

But it’s AWS Glue so we don’t have a local filesystem to save it to.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:05:55
+
+

*Thread Reply:* (I also hear you that S3 isn’t an ideal place for concatenating to a logfile because you can’t concatenate)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 12:20:46
+
+

*Thread Reply:* Unfortunately, I have zero experience with Glue.

+ +

Several approaches:

+ +
  1. Emit to Kafka (you can use MSK)
  2. Emit to Kinesis
  3. Emit to Console (perhaps a centralised logging tool, like Cloudwatch will pick it up)
  4. Emit to a local file, but I have no idea how you retrieve that file.
  5. Emit to an HTTP endpoint
  6. +
+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:22:25
+
+

*Thread Reply:* I appreciate some ideas for next steps

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:22:30
+
+

*Thread Reply:* Thank you

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-21 12:25:30
+
+

*Thread Reply:* did you try transport console to check if the OL setup is working? regardless of i/o, it should put something in the logs with an event.

+ + + +
+ 👀 Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 12:36:41
+
+

*Thread Reply:* Assuming the log4j[2].properties file is configured to allow the io.openlineage package to log at the appropriate level.

+ + + +
+ 👀 Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-22 07:01:47
+
+

*Thread Reply:* @Sheeri Cabral (Collibra), did you try to use a different transport type, as suggested by @Damien Hawes in https://openlineage.slack.com/archives/C01CK9T7HKR/p1711038046057459?thread_ts=1711026366.869199&cid=C01CK9T7HKR? And described in the docs: +https://openlineage.io/docs/integrations/spark/configuration/transport#file

+ +

Or would you like for the OL spark driver to support an additional transport type (e.g. s3) to emit OpenLineage events?

+
+ + +
+ + + } + + Damien Hawes + (https://openlineage.slack.com/team/U05FLJE4GDU) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-22 09:40:39
+
+

*Thread Reply:* I will try different transport types, haven’t gotten a chance to yet.

+ + + +
+ 🙌 tati +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-25 07:05:17
+
+

*Thread Reply:* Thanks, @Sheeri Cabral (Collibra); please let us know how it goes!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 05:06:26
+
+

*Thread Reply:* @Sheeri Cabral (Collibra) did you tried on the other transport types by any chance?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:32:20
+
+

*Thread Reply:* Sorry, with the holiday long weekend in Europe things are a bit slow. We did, and I just put a message in the #general chat https://openlineage.slack.com/archives/C01CK9T7HKR/p1712147347085319 as we are getting some errors with the spark integration.

+
+ + +
+ + + } + + Sheeri Cabral + (https://openlineage.slack.com/team/U0323HG8C8H) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-22 14:45:12
+
+

I've been testing around with different Spark versions. Does anyone know if OpenLineage works with spark 2.4.4 (scala 2.12.10)? Ive getting a lot of errors, but ive only tried versions 1.8+

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 16:36:32
+
+

*Thread Reply:* Hi @Rodrigo Maia, OpenLineage does not officially support Spark 2.4.4. The earliest version supported is 2.4.6. See this doc for more information about the supported versions of Spark, Airflow, Dagster, dbt, and Flink.

+ + + +
+ 👍 Rodrigo Maia +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-23 04:15:27
+
+

*Thread Reply:* OpenLineage CI runs against 2.4.6 and it is passing. I wouldn't expect any breaking differences between 2.4.4 and 2.4.6, but please let us know if this is the case.

+ + + +
+ 👍 Rodrigo Maia +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 15:18:52
+
+

@channel +Thanks to everyone who attended our first Boston meetup, co-sponsored by Astronomer and Collibra and featuring presentations by partners at Collibra, Astronomer and DataDog, this past Tuesday at Microsoft New England. Shout out to @Sheeri Cabral (Collibra), @Jonathan Morin, and @Paweł Leszczyński for presenting and to Sheeri for co-hosting! Topics included: +• "2023 in OpenLineage," a big year that saw: + ◦ 5 new integrations, + ◦ the Airflow Provider launch, + ◦ the addition of static/"design-time" lineage in 1.0.0, + ◦ the addition of column lineage from SQL statements via the SQL parser, + ◦ and 22 releases. +• A demo of Marquez, which now supports column-level lineage in a revamped UI +• Discussion of "Why Do People Use Lineage?" by Sheeri at Collibra, covering: + ◦ differences between design and operational lineage, + ◦ use cases served such as compliance, traceability/provenance, impact analysis, migration validation, and quicker onboarding, + ◦ features of Collibra's lineage +• A demo of streaming support in the Apache Flink integration by Paweł at Astronomer, illustrating lineage from: + ◦ a Flink job reading from a Kafka topic to Postgres, + ◦ a few SQL jobs running queries in Postgres, + ◦ a Flink job taking a Postgres table and publishing it back to Kafka +• A demo of an OpenLineage integration POC at DataDog by Jonathan, covering: + ◦ Use cases served by DataDog's Data Streams Monitoring service + ◦ OpenLineage's potential role providing and standardizing cross-platform lineage for DataDog's monitoring platform. +Thanks to Microsoft for providing the space. +If you're interested in attending, presenting at, or hosting a future meetup, please reach out.

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+ 🙌 Jonathan Morin, Harel Shein, Rodrigo Maia, Maciej Obuchowski +
+ +
+ :datadog: Harel Shein, Paweł Leszczyński, Rodrigo Maia, Maciej Obuchowski, Jean-Mathieu Saponaro +
+ +
+ 👏 Peter Huang, Rodrigo Maia, tati +
+ +
+ 🎉 tati +
+ +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:08:21
+
+

*Thread Reply:* Hey @Michael Robinson, was the meetup recorded?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-25 09:26:04
+
+

*Thread Reply:* @Maciej Obuchowski yes, and a clip is on YouTube. Hoping to have @Jonathan Morin’s clip posted soon, as well

+
+
YouTube
+ +
+ + + } + + OpenLineage Project + (https://www.youtube.com/@openlineageproject6897) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 19:57:48
+
+

Airflow 2.8.3 Python 3.11 +Trying to do a hello world lineage example using this simple bash operator DAG — but I don’t have anything emitting to my marquez backend. +I’m running airflow locally following docker-compose setup here. +More details in thread:

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 19:59:45
+
+

*Thread Reply:* Here is my airflow.cfg under +```[webserver] +expose_config = 'True'

+ +

[openlineage] +configpath = '' +transport = '{"type": "http", "url": "http://localhost:5002", "endpoint": "api/v1/lineage"}' +disabledfor_operators = ''```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 20:01:15
+
+

*Thread Reply:* I can curl my marquez backend just fine — but yeah not seeing anything emitted by airflow

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 20:19:44
+
+

*Thread Reply:* Have I missed something in the set-up? Is there a way I can validate the config was ingested correctly?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-23 03:42:40
+
+

*Thread Reply:* Can you see any logs related to OL in Airflow? Is Marquez in the same docker compose? Maybe try changing to host.docker.internal from localhost

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 00:51:31
+
+

*Thread Reply:* So I figured it out. For reference the issue was that ./config wasn’t for airflow.cfg as I had blindly interpreted it to be. Instead, setting the open lineage values as environment variables worked.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 01:01:47
+
+

*Thread Reply:* Otherwise for the simple DAG with just BashOperators, I was expecting to see a similar “lineage” DAG in marquez, but I only see individual jobs. Is that expected?

+ +

Formulating my question differently, does the open lineage data model assume a bipartite type graph, of Job -> Dataset -> Job -> Dataset etc always? Seems like there would be cases where you could have Job -> Job where there is no explicit “data artifact produced”?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 02:13:30
+
+

*Thread Reply:* Another question — is there going to be integration with the “datasets” & inlets/outlets concept airflow now has? +E.g. I would expect the OL integration to capture this:

+ +

```# [START datasetdef] +dag1dataset = Dataset("", extra={"hi": "bye"})

+ +

[END dataset_def]

+ +

with DAG( + dagid="datasetproduces1", + catchup=False, + startdate=pendulum.datetime(2021, 1, 1, tz="UTC"), + schedule="@daily", + tags=["produces", "dataset-scheduled"], +) as dag1: + # [START taskoutlet] + BashOperator(outlets=[dag1dataset], taskid="producingtask1", bashcommand="sleep 5") + # [END task_outlet]``` +i.e. the outlets part. Currently it doesn’t seem to.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-25 03:47:29
+
+

*Thread Reply:* OL only converts File and Table entities so far from manual inlets and outlets

+ + + +
+ 👍 Stefan Krawczyk +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-25 05:00:22
+
+

*Thread Reply:* on the Job -> Dataset -> Job -> Dataset: OL and Marquez do not aim into reflecting Airflow DAGs. They rather focus on exposing metadata that is collected around data processing

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-25 14:27:42
+
+

*Thread Reply:* > on the Job -> Dataset -> Job -> Dataset: OL and Marquez do not aim into reflecting Airflow DAGs. They rather focus on exposing metadata that is collected around data processing +That makes sense. I’m was just thinking through the implications and boundaries of what “lineage” is modeled. Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 06:18:05
+
+

Hi Team... We have a use case where we want to know when a column of the table gets updated in BIGQUERY and we have some questions related to it.

+ +
  1. In some of the openlineage events that are generated, outputs.facets.columnLineage is null. Can we assume all the columns get updated when this is the case?
  2. Also outputs.facets.schema seems to be null in some of the events generated. How do we get the schema of the table in this case?
  3. output.namespace is also null in some cases. How do we determine output datasource in this case?
  4. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:07:02
+
+

*Thread Reply:* For BigQuery, we use BigQuery API to get the lineage that unfortunately does not present us with column-level lineage. Adding that would be a feature.

+ +

For 2. and 3. it might happen that the result you're reading is from query cache, as this was earlier executed and not changed - in that case we won't have full information yet. https://cloud.google.com/bigquery/docs/cached-results

+
+
Google Cloud
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 07:45:04
+
+

*Thread Reply:* So, can we assume that if the query is not a duplicate one, fields outputs.facets.schema and output.namespace will not be empty? +And ignore the COMPLETE events when those fields are empty as they are not providing any new updates?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:59:55
+
+

*Thread Reply:* > So, can we assume that if the query is not a duplicate one, fields outputs.facets.schema and output.namespace will not be empty? +Yes, I would assume so. +> And ignore the COMPLETE events when those fields are empty as they are not providing any new updates? +That probably depends on your use case, different jobs can access same tables/do same queries in that case.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 23:49:46
+
+

*Thread Reply:* Okay. We wanted to know how can we determine the output datasource from the events?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-26 01:51:15
+
+

Hi Team, +Currently OpenLineage Marquez use postgres db to store the meta data. Instead postgres, we want to store them on Snowflake DB. Do we have kind if inbuilt configuration in the marquez application to change the marquez database to Snowflake? If not, what will be the approach?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 04:50:25
+
+

*Thread Reply:* The last time I looked at Marquez (July last year), Marquez was highly coupled to PostgreSQL specific functionality. It had code, particularly for the graph traversal, written in PostgreSQL's PL/pgSQL. Furthermore, it uses PostgreSQL as an OLTP database. My limited knowledge of Snowflake says that it is an OLAP database, this means that it would be a very poor fit for the application. For any migration to another database engine, it would be a large undertaking.

+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-26 05:13:25
+
+

*Thread Reply:* Hi @Ruchira Prasad, this is not possible at the moment. Marquez splits OL events into neat relational model to allow efficient lineage queries. I don't think this would be achievable in Snowflake.

+ +

As an alternative approach, you can try fluentd proxy -> https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd +Fluentd provides bunch of useful output plugins that let you send logs into several warehouses (https://www.fluentd.org/plugins), however I cannot find snowflake on the list.

+ +

On the snowflake side, there is quickstart on how to ingest fluentd logs into it -> https://quickstarts.snowflake.com/guide/integrating_fluentd_with_snowflake/index.html#0

+ +

To wrap up: if you need lineage events in Snowflake, you can consider sending events to a FluentD endpoint and then load them to Snowflake. In contrast to Marquez, you will query raw events which may be cumbersome in some cases like getting several OL events that describe a single run.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 05:56:39
+
+

*Thread Reply:* Note that supporting (not even migrating) a backend application that can use multiple database engines comes at a huge opportunity cost, and it's not like Marquez has more contributors than it needs 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-26 06:28:47
+
+

*Thread Reply:* Since both Postgres and Snowflake supports JDBC, can't we point to Snowflake with changing following?

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 06:29:16
+
+

*Thread Reply:* It doesn't have anything to do with the driver. JDBC is the driver, it defines the protocol that that communication link must abide by.

+ +

Just like how ODBC is a driver, and in the .NET world, how OLE DB is a driver.

+ +

It tells us nothing about the capabilities of the database. In this case, using PostgreSQL was chosen because of its capabilities, and because of those capabilities, the application code leverages more of those capabilities than just a generic read / write database. Moving all that logic from PostgreSQL PL/pgSQL to the application would (1) take a significant investment in time; (2) present bugs; (3) slow down the application response time, because you have to make many more round-trips to the database, instead of keeping the code close to the data.

+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 06:39:57
+
+

*Thread Reply:* If you're still curious, and want to test things out for yourself:

+ +
  1. Create a graph structure on a SQL database (edge table, vertex table, relationship table)
  2. Write SQL to perform that traversal
  3. Write Java application code that reads from the database, then tries to perform traversals by again reading data from the database. +Measure the performance impact, and you will see that (2) is far quicker than (3). This is one of the reasons why Marquez uses PostgreSQL and leverages its PL/pgSQL capabilities, because otherwise the application would be significantly for any traversal that is more than a few levels deep.
  4. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:57:50
+
+

Hi Team,

+ +

Looking for feedback on the below Problem and Proposal.

+ +

We are using OpenLineage with our AWS EMR clusters to extract lineage and send it to a backend Marquez deployment (also in AWS). This is working fine and we are getting table and column level lineage.

+ +

Problem: Is we are seeing: +• 15+ OpenLineage events with multiple jobs being shows in Marquez for a single Spark job in EMR. This causes confusion because team members using Marquez are unsure which "job" in Marquez to look at. +• The S3 locations are being populated in the namespace. We wanted to use namespace for teams. However, having S3 locations in the namespace in a way "pollutes" the list. +I understand the above are not issues/bugs. However, our users want us to "clean" up the Marquez UI.

+ +

Proposal: One idea was to have a Lambda intercept the 10-20 raw OpenLineage events from EMR and then process -> condense them down to 1 event with the job, run, inputs, outputs. And secondly, to swap out the namespace from S3 to actual team names via a lookup we would host ourselves.

+ +

While the above proposal technically could work we wanted to check with the team here if it makes sense, any caveats, alternatives others have used. Ideally, we don't want to own parsing OpenLineage events if there is an existing solution.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:58:15
+
+

*Thread Reply:* Screenshot: 1 spark job = multiple "jobs" in Marquez

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:58:35
+
+

*Thread Reply:* Screenshot: S3 locations in namespace.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-26 16:59:48
+
+

*Thread Reply:* Hi @Bipan Sihra, thanks for posting this -- it's exciting to hear about your use case at Amazon! I wonder if you wouldn't mind opening a GitHub issue so we can track progress on this and make sure you get answers to your questions.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-26 17:23:19
+
+

*Thread Reply:* Also, would you please share the version of openlineage-spark you are on?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-27 09:05:09
+
+

*Thread Reply:* Hi @Michael Robinson. Sure, I can open a Github issue. +Also, we are currently using io.openlineage:openlineage_spark_2.12:1.9.1.

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tristan GUEZENNEC -CROIX- + (tristan.guezennec@decathlon.com) +
+
2024-03-28 09:51:12
+
+

*Thread Reply:* @Yannick Libert

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-28 09:52:43
+
+

*Thread Reply:* I was able to find info I needed here: https://github.com/OpenLineage/OpenLineage/discussions/597

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-03-27 07:55:37
+
+

Hi Team, we are trying to collect lineage for a Spark job using OpenLineage(v1.8.0) and Marquez (v0.46). We can see the "Schema" details for all "Datasets" created but we can't see "Column-level" lineage and getting "Column lineage not available for the specified dataset" on Marquez UI under "COLUMN LINEAGE" tab.

+ +

About Spark Job: The job reads data from few oracle tables using JDBC connections as Temp views in Spark, performs some transformations (joining & aggregations) over different steps, creating intermediate temp views and finally writing the data to HDFS location. So, it looks something like this:

+ +

Read oracle tables as temp views -&gt; transformations set1 --&gt; creation of few more temp views from previously created temp views --&gt; transformations set2, set3 ... --&gt; Finally writing to hdfs(when all the temp view gets materialised in-memory to create final output dataset). +We are getting the schema details for finally written dataset but no column-level lineage for the same. Also, while checking the json lineage data, I can see "" (blank) for "inputs" key (just before "outputs" key which contains dataset name & other details in nested key-value form). As per my understanding, this explains null value for "columnLineage" key hence no column-level lineage but unable to understand why!

+ +

Appreciate if you could share some thoughts/idea in terms of what is going wrong here as we are stuck on this point? Also, not sure we can get the column-level lineage only for datasets created from permanent Hive tables and not for temp/un-materialised views using OpenLineage & Marquez.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-27 08:54:38
+
+

*Thread Reply:* My first guess would be that either some of the interaction between JDBC/views/materialization make the CLL not show, or possibly transformations - if you're doing stuff like UDFs we lose the column-level info, but it's hard to confirm without seeing events and/or some minimal reproduction

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-03-29 08:48:03
+
+

*Thread Reply:* Hi @Maciej Obuchowski, Thanks for responding on this. +We are using SparkSQL where we are reading the data from Oracle tables as temptable then running sql like queries (for transformation) on previously created temptable. +Now, let say we want to run a set of transformations, so we have written the transformation logic as sql like queryies. So, when this first query (query1) would get executed resulting in creation of temptable1, then query2 will get executed on temptable1 creating temptable2 and so on. For such use case, we have developed a custom function, this custom function will take these queries (query1, query2, ...) as input and will run iteratively and will create temptable1, temptable2,... and so on. This custom function uses RDD APIs and in-built functions like collect() along with few other scala functions. So, not sure whether usage of RDD will break the lineage or what's going wrong. +Lastly, we do have jobs where we are using direct UDFs in spark but we aren't getting CLL for those jobs also which doesn't have UDF usage. +Hope this gives some context on how we are running the job.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-04-04 13:08:32
+
+

*Thread Reply:* Hey @Maciej Obuchowski, appreciate your help/comments on this.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
George Tong + (george@terradot.earth) +
+
2024-03-27 14:53:44
+
+

Hey everyone 👋

+ +

I’m working at a carbon capture 🌍 company and we’re designing how we want to store data in our PostgreSQL database at the moment. One of the key things we’re focusing on is traceability and transparency of data, as well as ability to edit and maintain historical data. This is key as if we make an error and need to update a previous data point, we want to know everything downstream of that data point that needs to be rerun and recalculated. You might be able to guess where this is going… +• Any advice on how we should be designing our table schemas to support editing and traceability? We’re currently looking using temporal tables +• Is Open Lineage the right tool for downstream tracking and traceability? Are there any other tools we should be looking at instead? +I’m new here so hopefully I asked in the right channel. Let me know if I should be asking elsewhere!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-28 05:55:48
+
+

*Thread Reply:* Hey, In my opinion, OpenLineage is the right tool for what you are describing. Together with some backend like Marquez it will allow you to visualize data flow, dependencies (upstreams, downstreams) and more 🙂

+ + + +
+ 🙌 George Tong +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-28 15:54:58
+
+

*Thread Reply:* Hi George, welcome! To add to what Kacper said, I think it also depends on what you are looking for in terms of "transparency." I guess I'm wondering exactly what you mean by this. A consumer using the OpenLineage standard (like Marquez, which we recommend in general but especially for getting started) will collect metadata about your pipelines' datasets and jobs but won't collect the data itself or support editing of your data. You're probably fully aware of this, but it's a point of confusion sometimes, and since you mentioned transparency and updating data I wanted to emphasize this. I hope this helps!

+ + + +
+ 🙌 George Tong +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
George Tong + (george@terradot.earth) +
+
2024-03-28 19:28:36
+
+

*Thread Reply:* Thanks for the thoughts folks! Yes I think my thoughts are starting to become more concrete - retaining a history of data and ensuring that you can always go back to a certain time of your data is different from understanding the downstream impact of a data change, (which is what OpenLineage seems to tackle)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-03-28 03:18:42
+
+

Hi team, so we're using OL v 1.3.1 on databricks, on a non termination cluster. We're seeing that the heap memory is increasing very significantly, and notice that the majority of the memory comes from OL. Any idea if we're having some memory leaks from OL? Have we seen any similar issues being reported before? Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-28 10:43:36
+
+

*Thread Reply:* First idea would be to bump version 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-28 10:56:55
+
+

*Thread Reply:* Does it affect all the jobs or just some of them? Does it somehow correlate with amount of spark tasks a job is processing? Would you be able to test the behaviour on the jar prepared from the branch? Any other details helping to reproduce this would be nice.

+ +

So many questions for the start... Happy to see you again @Anirudh Shrinivason. Can't wait looking into this next week.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-28 11:12:23
+
+

*Thread Reply:* FYI - this is my experience as discussed on Tuesday @Paweł Leszczyński @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-01 05:31:09
+
+

*Thread Reply:* Hey @Maciej Obuchowski @Paweł Leszczyński Thanks for the questions! Here are some details and clarifications I have:

+ +
  1. First idea would be to bump version Has such an issue been fixed in the later versions? So this is an already known issue with 1.3.1 version? Just curious why bumping it might resolve the issue...
  2. Does it affect all the jobs or just some of them So far, we're monitoring the heap at a cluster level... It's a shared non-termination cluster. I'll try to take a look at a job level to get some more insights.
  3. Does it somehow correlate with amount of spark tasks a job is processing This was my initial thought too, but from looking at a few of the pipelines, they seem relatively straightforward logic wise. And I don't think it's because a lot of tasks are running in parallel causing the amount of allocated objects to be very high... (Let me check back on this)
  4. Any other details helping to reproduce this would be nice. Yes! Let me try to dig a little more, and try to get back with more details...
  5. FYI - this is my experience as discussed on Tuesday Hi @Damien Hawes may I check if there is anywhere I could get some more information on your observations? Since it seems related, maybe they're the same issues? +But all in all, I ran a high level memory analyzer, and it seemed to look like a memory leak from the OL jar... We noticed the heap size from OL almost monotonically increasing to >600mb... +I'll try to check and do a bit more analysis before getting back with more details. :gratitudethankyou:
  6. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-02 00:52:32
+
+

*Thread Reply:* This is what the heap dump looks like after 45 mins btw... ~11gb from openlineage out of 14gb heap

+ + + + +
+ ❤️ Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 03:34:50
+
+

*Thread Reply:* Nice. That's slightly different to my experience. We're running a streaming pipeline on a conventional Spark cluster (not databricks).

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 04:56:13
+
+

*Thread Reply:* OK. I've found the bug. I will create an issue for it.

+ +

cc @Maciej Obuchowski @Paweł Leszczyński

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 04:59:49
+
+

*Thread Reply:* Great. I am also looking into unknown facet. I think this could be something like this -> https://github.com/OpenLineage/OpenLineage/pull/2557/files

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:00:25
+
+

*Thread Reply:* Not quite.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:00
+
+

*Thread Reply:* The problem is that the UnknownEntryFacetListener accumulates state, even if the spark_unknown facet is disabled.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:41
+
+

*Thread Reply:* The problem is that the code eagerly calls UnknownEntryFacetListener#apply

+ + + +
+ 🙌 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:54
+
+

*Thread Reply:* Without checking if the facet is disabled or not.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:02:17
+
+

*Thread Reply:* It only checks whether the facet is disabled or not, when it needs to add the details to the event.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:03:40
+
+

*Thread Reply:* Furthermore, even if the facet is enabled, it never clears its state.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 05:04:31
+
+

*Thread Reply:* yes, and if logical plan is spark.createDataFrame with local data, this can get huge

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:01:10
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2561

+ + + +
+ 👍 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-03 06:20:51
+
+

*Thread Reply:* 🙇

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-28 21:50:01
+
+

Hello All - I've begun my OL journey rather recently and am running into trouble getting lineage going in an airflow job. I spun up a quick flask server to accept and print the OL requests. It appears that there are no Inputs or Outputs. Is that something I have to set in my DAG? Reference code and responses are attached.

+ +
+ + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 03:38:18
+
+

*Thread Reply:* hook-level lineage is not yet supported, you should you SnowflakeOperator instead

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 08:53:29
+
+

*Thread Reply:* Thanks @Jakub Dardziński! I used the hook because it looks like that is the supported operator based on airflow docs

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 09:20:10
+
+

*Thread Reply:* you can see this is under SQLExecuteQueryOperator +without going into the details part of the implentation is on hooks side there, not the operator

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Vinnakota Priyatam Sai + (vinnakota.priyatam@walmart.com) +
+
2024-03-29 00:14:17
+
+

Hi team, we are collecting OpenLineage events across different jobs where the output datasources are BQ, Cassandra and Postgres. We are mostly interested in the freshness of columns across these different datasources. Using OpenLineage COMPLETE event's dataset.datasource and dataset.schema we want to understand which columns are updated at what time.

+ +

We have a few questions related to BQ (as output dataset) events:

+ +
  1. How to identify if the output datasource is BQ, Cassandra or Postgres?
  2. Can we rely on dataset.datasource and dataset.schema for BQ table name and column names?
  3. Even if one column is updated, do we get all the column details in dataset.schema?
  4. If dataset.datasource or dataset.schema value is null, can we assume that no column has been updated in that event?
  5. Are there any sample BQ events that we can refer to understand the events?
  6. Is it possible to get columnLineage details for BQ as output datasource?
  7. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:11:28
+
+

*Thread Reply:* > 1. How to identify if the output datasource is BQ, Cassandra or Postgres? +The dataset namespace would contain that information: for example, the namespace for BQ would be simple bigquery and for Postgres it would be postgres://{host}:{port}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:15:06
+
+

*Thread Reply:* > 1. Can we rely on dataset.datasource and dataset.schema for BQ table name and column names? +> 2. Even if one column is updated, do we get all the column details in dataset.schema? +> 3. If dataset.datasource or dataset.schema value is null, can we assume that no column has been updated in that event? +If talking about BigQuery Airflow operators, the known issue is BigQuery query caching. You're guaranteed to get this information if the query is running for the first time, but if the query is just reading from the cache instead of being executed, we don't get that information. That would result in a run without actual input dataset data.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:15:56
+
+

*Thread Reply:* > 1. Is it possible to get columnLineage details for BQ as output datasource? +BigQuery API does not give us this information yet - we could augment the API data with SQL parser one though. It's a feature that don't exist yet though

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Vinnakota Priyatam Sai + (vinnakota.priyatam@walmart.com) +
+
2024-03-29 10:18:32
+
+

*Thread Reply:* This is very helpful, thanks a lot @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark Dunphy + (markd@spotify.com) +
+
2024-03-29 11:54:02
+
+

Hi all, we are trying to use dbt-ol to capture lineage. We use dbt custom aliases based on the --target flag passed in to dbt-ol run. So for example if using --target dev the model alias might be some_prefix__model_a whereas with --target prod the model alias might be model_a without any prefix. OpenLineage doesn't seem to pick up on this custom alias and sends model_a regardless in the input/output. Is this intended? I'm relatively new to this data world so it is possible I'm missing something basic here.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 15:52:17
+
+

*Thread Reply:* Welcome and thanks for using OpenLineage! Someone with dbt expertise will reply soon.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:21:35
+
+

*Thread Reply:* looks like it’s another entry in manifest.json : https://schemas.getdbt.com/dbt/manifest/v10.json

+ +

called alias that is not taken into consideration

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:22:24
+
+

*Thread Reply:* it needs more analysis whether and how this entry is set

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:30:06
+
+

*Thread Reply:* btw how do you create alias per target? I did this:

+ +
-- Use the `ref` function to select from other models
+{% if target.name != 'prod' %}
+{{ config(materialized='incremental',unique_key='id',
+        on_schema_change='sync_all_columns', alias='third_model_dev'
+) }}
+{% else %}
+{{ config(materialized='incremental',unique_key='id',
+        on_schema_change='sync_all_columns', alias='third_model_prod'
+) }}
+{% endif %}
+
+select x.id, lower(y.name)
+from {{ ref('my_first_dbt_model') }} as x
+left join {{ ref('my_second_dbt_model' )}} as y
+ON x.id = y.i
+
+ +

but I’m curious if that’s correct scenario to test

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark Dunphy + (markd@spotify.com) +
+
2024-04-01 09:31:26
+
+

*Thread Reply:* thanks for looking into this @Jakub Dardziński! we are using the generatealiasname macro to control this. our macro looks very similar to this example

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 12:37:48
+
+

Is it possible to configure OL to only send OL Events for certain dags in airflow?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 14:22:30
+
+

*Thread Reply:* it will be possible once latest version of OL provider is released with this PR: +https://github.com/apache/airflow/pull/37725

+
+ + + + + + + +
+
Labels
+ area:providers, area:dev-tools, kind:documentation, provider:openlineage +
+ + + + + + + + + + +
+ + + +
+ ✅ Tom Linton +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 16:09:16
+
+

*Thread Reply:* Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 13:10:52
+
+

Is it common to see this error?

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 17:32:07
+
+

*Thread Reply:* seems like trim in select statements causes issues

+ + + +
+ ✅ Tom Linton +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 10:04:45
+
+

@channel +I'd like to open a vote to release OpenLineage 1.11.0, including: +• Spark: lineage metadata extraction built-in to Spark extensions +• Spark: change SparkPropertyFacetBuilder to support recording Spark runtime config +• Java client: add metrics-gathering mechanism +• Flink: support Flink 1.19.0 +• SQL: show error message when OpenLineageSql cannot find native library +Three +1s from committers will authorize. Thanks!

+ + + +
+ ➕ Harel Shein, Rodrigo Maia, Jakub Dardziński, alexandre bergere, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 09:44:38
+
+

*Thread Reply:* Thanks, all. The release is authorized and will be performed within 2 business days excluding tomorrow.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 16:13:24
+
+

@channel +The latest issue of OpenLineage News is available now, featuring a rundown of upcoming and recent events, recent releases, updates to the Airflow Provider, open proposals, and more. +To get the newsletter directly in your inbox each month, sign up here. +openlineage.us14.list-manage.com

+
+
openlineage.us14.list-manage.com
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 06:01:39
+
+

Hi All, We are trying transform entities according to medallian model, where each entity goes through multiple layers of data transformation and the workflow is like the data is picked from kafka channel and stored into parquet and then trasforming it to hudi tables in silver layer. so now we are trying to capture lineage data, so far we have tried with transport type console but we are not seeing the lineage data in console (we are running this job from aws glue). below are the configuration which we have added. +spark = (SparkSession.builder + .appName('samplelineage') + .config('spark.jars.packages', 'io.openlineage:openlineagespark:1.8.0') + .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') + .config('spark.openlineage.namespace', 'LineagePortTest') + .config('spark.openlineage.parentJobNamespace', 'LineageJobNameSpace') + .config("spark.openlineage.transport.type", "console") + .config('spark.openlineage.parentJobName', 'LineageJobName') + .getOrCreate())

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 07:24:13
+
+

*Thread Reply:* Does Spark tell your during startup that it is adding the listener?

+ +

The log line should be something like "Adding io.openlineage.spark.agent.OpenLineageSparkListener"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 07:24:58
+
+

*Thread Reply:* Additionally, ensure your log4j.properties / log4j2.properties (depending on the version of Spark that you are using) allows io.openlineage at info level

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:04:16
+
+

*Thread Reply:* I think, as usual, hudi is the problem 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:04:35
+
+

*Thread Reply:* or are you just not seeing any OL logs/events?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:05:31
+
+

*Thread Reply:* as @Damien Hawes said, you should see Spark log +org.apache.spark.SparkContext - Registered listener io.openlineage.spark.agent.OpenLineageSparkListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 09:24:00
+
+

*Thread Reply:* yes I could see the mentioned logs in the console while job runs

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 09:30:17
+
+

*Thread Reply:* Also we are not seeing OL events

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 08:32:49
+
+

*Thread Reply:* do you see any errors or other logs that could be relevant to OpenLineage? +also, some simple reproduction might help

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-03 09:06:18
+
+

*Thread Reply:* ya we could see below logs INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:04:07
+
+

Hi All! Im trying to set up OpenLineage with Managed Flink at AWS. but im getting this error:

+ +
`"throwableInformation": "io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat io.openlineage.client.transports.HttpTransport.throwOnHttpError(HttpTransport.java:151)\n\tat`
+
+ +

This is what i see in marquez. where is flink is trying to send the open lineage events

+ +

items +"message":string"The Job Result cannot be fetch..." +"_producer":string"<https://github.com/OpenLineage>..." +"_schemaURL":string"<https://openlineage.io/spec/fa>..." +"stackTrace":string"org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission. at org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92) at

+ +

Im passing the conf like this:

+ +

Properties props = new Properties(); +props.put("openlineage.transport.type","http"); +props.put("openlineage.transport.url","http://<marquez-ip>:5000/api/v1/lineage"); +props.put("execution.attached","true"); +Configuration conf = ConfigurationUtils.createConfiguration(props); +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:26:12
+
+

*Thread Reply:* Hey @Francisco Morillo, which version of Marquez are you running? Streaming support was a relatively recent addition to Marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:29:32
+
+

*Thread Reply:* So i was able to set it up working locally. Having Flink integrated with open lineage

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:29:43
+
+

*Thread Reply:* But once i deployed marquez in an ec2 using docker

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:30:16
+
+

*Thread Reply:* and have managed flink trying to emit events to openlineage i just receive the flink job event, but not the kafka source / iceberg sink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:32:31
+
+

*Thread Reply:* I ran this: +$ git clone git@github.com:MarquezProject/marquez.git &amp;&amp; cd marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:50:41
+
+

*Thread Reply:* hmmm. I see. you're probably running the latest version of marquez then, should be ok. +did you try the console transport first to see how the events look like?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:51:10
+
+

*Thread Reply:* kafka source and iceberg sink should be well supported for flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:54:31
+
+

*Thread Reply:* i believe there is an issue with how the conf is passed to flink job in managed flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:55:37
+
+

*Thread Reply:* ah, that may be the case. what are you seeing in the flink job logs?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 14:59:02
+
+

*Thread Reply:* I think setting execution.attached might not work when you set it this way

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 15:05:05
+
+

*Thread Reply:* is there an option to use regular flink-conf.yaml?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:48:34
+
+

*Thread Reply:* in the flink logs im seeing the io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat.

+ +

in marquez im seeing the job result cannot be fetched.

+ +

we cant modify flink-conf in managed flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:49:39
+
+

*Thread Reply:*

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:49:54
+
+

*Thread Reply:* this is what i see at marquez at ec2

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:50:58
+
+

*Thread Reply:* hmmm.. I'm wondering if the issue is with Marquez processing the events or the openlineage events themselves. +can you try with: +props.put("openlineage.transport.type","console"); +?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:51:08
+
+

*Thread Reply:* compared to what i see locally. Locally is the same job but just writing to localhost marquez, but im passing the openlineage conf trough env

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:52:50
+
+

*Thread Reply:* @Harel Shein when set to console, where will the events be printed? Cloudwatch logs?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:53:17
+
+

*Thread Reply:* I think so, yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:53:20
+
+

*Thread Reply:* let me try

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:53:39
+
+

*Thread Reply:* the same place you're seeing your flink logs right now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:54:12
+
+

*Thread Reply:* the same place you found that client exception

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:34
+
+

*Thread Reply:* I will post the events

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:37
+
+

*Thread Reply:* "logger": "io.openlineage.flink.OpenLineageFlinkJobListener", "message": "onJobSubmitted event triggered for flink-jobs-prod.kafka-iceberg-prod", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:52
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.TransformationUtils.processLegacySinkTransformation(TransformationUtils.java:90)", "logger": "io.openlineage.flink.TransformationUtils", "message": "Processing legacy sink operator Print to System.out", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:10:08
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.TransformationUtils.processLegacySinkTransformation(TransformationUtils.java:90)", "logger": "io.openlineage.flink.TransformationUtils", "message": "Processing legacy sink operator org.apache.flink.streaming.api.functions.sink.DiscardingSink@68d0a141", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:10:46
+
+

*Thread Reply:* "locationInformation": "io.openlineage.client.transports.ConsoleTransport.emit(ConsoleTransport.java:21)", "logger": "io.openlineage.client.transports.ConsoleTransport", "message": "{\"eventTime\":\"2024_04_02T20:07:03.30108Z\",\"producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"schemaURL\":\"<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>\",\"eventType\":\"START\",\"run\":{\"runId\":\"cda9a0d2_6dfd_4db2_b3d0_f11d7b082dc0\"},\"job\":{\"namespace\":\"flink_jobs_prod\",\"name\":\"kafka-iceberg-prod\",\"facets\":{\"jobType\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>\",\"processingType\":\"STREAMING\",\"integration\":\"FLINK\",\"jobType\":\"JOB\"}}},\"inputs\":[{\"namespace\":\"<kafka://b-1.mskflinkopenlineage>.&lt;&gt;.<http://kafka.us-east-1.amazonaws.com:9092,b_3.mskflinkopenlineage.&lt;&gt;kafka.us_east_1.amazonaws.com:9092,b-2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us-east-1.amazonaws.com:9092\%22,\%22name\%22:\%22temperature-samples\%22,\%22facets\%22:{\%22schema\%22:{\%22_producer\%22:\%22&lt;https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink&gt;\%22,\%22_schemaURL\%22:\%22&lt;https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet&gt;\%22,\%22fields\%22:[{\%22name\%22:\%22sensorId\%22,\%22type\%22:\%22int\%22},{\%22name\%22:\%22room\%22,\%22type\%22:\%22string\%22},{\%22name\%22:\%22temperature\%22,\%22type\%22:\%22float\%22},{\%22name\%22:\%22sampleTime\%22,\%22type\%22:\%22long\%22}]}}|kafka.us_east_1.amazonaws.com:9092,b-3.mskflinkopenlineage.&lt;&gt;kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us_east_1.amazonaws.com:9092\",\"name\":\"temperature_samples\",\"facets\":{\"schema\":{\"_producer\":\"&lt;https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink&gt;\",\"_schemaURL\":\"&lt;https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet&gt;\",\"fields\":[{\"name\":\"sensorId\",\"type\":\"int\"},{\"name\":\"room\",\"type\":\"string\"},{\"name\":\"temperature\",\"type\":\"float\"},{\"name\":\"sampleTime\",\"type\":\"long\"}]}}>}],\"outputs\":[{\"namespace\":\"<s3://iceberg-open-lineage-891377161433>\",\"name\":\"/iceberg/open_lineage.db/open_lineage_room_temperature_prod\",\"facets\":{\"schema\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>\",\"fields\":[{\"name\":\"room\",\"type\":\"STRING\"},{\"name\":\"temperature\",\"type\":\"FLOAT\"},{\"name\":\"sampleCount\",\"type\":\"INTEGER\"},{\"name\":\"lastSampleTime\",\"type\":\"TIMESTAMP\"}]}}}]}",

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:12
+
+

*Thread Reply:* locationInformation": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker.startTracking(OpenLineageContinousJobTracker.java:100)", "logger": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker", "message": "Starting tracking thread for jobId=de9e0d5b5d19437910975f231d5ed4b5", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:25
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.OpenLineageFlinkJobListener.onJobExecuted(OpenLineageFlinkJobListener.java:191)", "logger": "io.openlineage.flink.OpenLineageFlinkJobListener", "message": "onJobExecuted event triggered for flink-jobs-prod.kafka-iceberg-prod", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:41
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker.stopTracking(OpenLineageContinousJobTracker.java:120)", "logger": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker", "message": "stop tracking", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:12:07
+
+

*Thread Reply:* "locationInformation": "io.openlineage.client.transports.ConsoleTransport.emit(ConsoleTransport.java:21)", "logger": "io.openlineage.client.transports.ConsoleTransport", "message": "{\"eventTime\":\"2024_04_02T20:07:04.028017Z\",\"producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"schemaURL\":\"<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>\",\"eventType\":\"FAIL\",\"run\":{\"runId\":\"cda9a0d2_6dfd_4db2_b3d0_f11d7b082dc0\",\"facets\":{\"errorMessage\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/1-0-0/ErrorMessageRunFacet.json#/$defs/ErrorMessageRunFacet>\",\"message\":\"The Job Result cannot be fetched through the Job Client when in Web Submission.\",\"programmingLanguage\":\"JAVA\",\"stackTrace\":\"org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission.\\n\\tat org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92)\\n\\tat org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:152)\\n\\tat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:123)\\n\\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)\\n\\tat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:342)\\n\\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\\n\\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\\n\\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\\n\\tat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)\\n\\tat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)\\n\\tat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)\\n\\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)\\n\\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)\\n\\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)\\n\\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)\\n\\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\\n\\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\\n\\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:829)\\n\"}}},\"job\":{\"namespace\":\"flink_jobs_prod\",\"name\":\"kafka-iceberg-prod\",\"facets\":{\"jobType\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>\",\"processingType\":\"STREAMING\",\"integration\":\"FLINK\",\"jobType\":\"JOB\"}}}}", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:15:35
+
+

*Thread Reply:* this is what i see in cloudwatch when set to console

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:17:50
+
+

*Thread Reply:* So its nothing to do with marquez but with openlineage and flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:22:10
+
+

*Thread Reply:* hmm.. the start event actually looks pretty good to me: +{ + "eventTime": "2024-04-02T20:07:03.30108Z", + "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", + "eventType": "START", + "run": { + "runId": "cda9a0d2-6dfd-4db2-b3d0-f11d7b082dc0" + }, + "job": { + "namespace": "flink-jobs-prod", + "name": "kafka-iceberg-prod", + "facets": { + "jobType": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>", + "processingType": "STREAMING", + "integration": "FLINK", + "jobType": "JOB" + } + } + }, + "inputs": [ + { + "namespace": "<kafka://b-1.mskflinkopenlineage>.&lt;&gt;.<http://kafka.us-east-1.amazonaws.com:9092,b_3.mskflinkopenlineage.&lt;&gt;kafka.us_east_1.amazonaws.com:9092,b-2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us-east-1.amazonaws.com:9092|kafka.us_east_1.amazonaws.com:9092,b-3.mskflinkopenlineage.&lt;&gt;kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us_east_1.amazonaws.com:9092>", + "name": "temperature-samples", + "facets": { + "schema": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", + "fields": [ + { + "name": "sensorId", + "type": "int" + }, + { + "name": "room", + "type": "string" + }, + { + "name": "temperature", + "type": "float" + }, + { + "name": "sampleTime", + "type": "long" + } + ] + } + } + } + ], + "outputs": [ + { + "namespace": "<s3://iceberg-open-lineage-891377161433>", + "name": "/iceberg/open_lineage.db/open_lineage_room_temperature_prod", + "facets": { + "schema": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", + "fields": [ + { + "name": "room", + "type": "STRING" + }, + { + "name": "temperature", + "type": "FLOAT" + }, + { + "name": "sampleCount", + "type": "INTEGER" + }, + { + "name": "lastSampleTime", + "type": "TIMESTAMP" + } + ] + } + } + } + ] +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:22:37
+
+

*Thread Reply:* so with that start event should marquez be able to build the proper lineage?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:22:57
+
+

*Thread Reply:* This is what i would get with flink marquez locally

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:23:33
+
+

*Thread Reply:* yes, but then it looks like the flink job is failing and we're seeing this event: +{ + "eventTime": "2024-04-02T20:07:04.028017Z", + "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", + "eventType": "FAIL", + "run": { + "runId": "cda9a0d2-6dfd-4db2-b3d0-f11d7b082dc0", + "facets": { + "errorMessage": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/ErrorMessageRunFacet.json#/$defs/ErrorMessageRunFacet>", + "message": "The Job Result cannot be fetched through the Job Client when in Web Submission.", + "programmingLanguage": "JAVA", + "stackTrace": "org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission.ntat org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92)ntat org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:152)ntat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:123)ntat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)ntat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:342)ntat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)ntat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)ntat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)ntat java.base/java.lang.reflect.Method.invoke(Method.java:566)ntat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)ntat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)ntat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)ntat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)ntat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)ntat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)ntat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)ntat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)ntat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)ntat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)ntat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)ntat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)ntat java.base/java.lang.Thread.run(Thread.java:829)n" + } + } + }, + "job": { + "namespace": "flink-jobs-prod", + "name": "kafka-iceberg-prod", + "facets": { + "jobType": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>", + "processingType": "STREAMING", + "integration": "FLINK", + "jobType": "JOB" + } + } + } +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:24:11
+
+

*Thread Reply:* But the thing is that the flink job is not really failling

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:25:03
+
+

*Thread Reply:* interesting, would love to see what @Paweł Leszczyński / @Maciej Obuchowski / @Peter Huang think. This is beyond my depth on the flink integration 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:34:51
+
+

*Thread Reply:* Thanks Harel!! Yes please, it would be great to see how openlineage can work with AWS Managed flink

+ + + +
+ ➕ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 02:43:12
+
+

*Thread Reply:* Just to clarify - is this setup working with openlineage flink integration turned off? From what I understand, your job emits cool START event, than a job fails and emits FAIL event with error stacktrace The Job Result cannot be fetched through the Job Client when in Web Submission which is cool as well.

+ +

The question is: does it fail bcz of Openlineage integration or it is just Openlineage which carries stacktrace of a failed job. I couldn't see anything Openlineage related in the stacktrace.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:43:34
+
+

*Thread Reply:* What do you mean with Flink integration turned off?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:44:28
+
+

*Thread Reply:* the flink job is not failling but, we are receiving an openlineage event that says fail, to which we then not see the proper dag in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:45:18
+
+

*Thread Reply:* does openlineage work if the job is submited through web submission?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:47:44
+
+

*Thread Reply:* the answer is "probably not unless you can set up execution.attached beforehand"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:48:49
+
+

*Thread Reply:* execution.attached doesnt seem to work with job submitted through web submission.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:54:51
+
+

*Thread Reply:* When setting execution attached to false, i only get the start event, but it doesnt build the dag in the job space in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:57:14
+
+

*Thread Reply:*

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:57:40
+
+

*Thread Reply:* I still see this in cloudwatch logs: locationInformation": "io.openlineage.flink.client.EventEmitter.emit(EventEmitter.java:50)", "logger": "io.openlineage.flink.client.EventEmitter", "message": "Failed to emit OpenLineage event: ", "messageSchemaVersion": "1", "messageType": "ERROR", "threadName": "Flink-DispatcherRestEndpoint-thread-1", "throwableInformation": "io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat io.openlineage.client.transports.HttpTransport.throwOnHttpError(HttpTransport.java:151)\n\tat io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:128)\n\tat io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:115)\n\tat io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:60)\n\tat io.openlineage.flink.client.EventEmitter.emit(EventEmitter.java:48)\n\tat io.openlineage.flink.visitor.lifecycle.FlinkExecutionContext.lambda$onJobSubmitted$0(FlinkExecutionContext.java:66)\n\tat io.openlineage.client.circuitBreaker.NoOpCircuitBreaker.run(NoOpCircuitBreaker.java:27)\n\tat io.openlineage.flink.visitor.lifecycle.FlinkExecutionContext.onJobSubmitted(FlinkExecutionContext.java:59)\n\tat io.openlineage.flink.OpenLineageFlinkJobListener.start(OpenLineageFlinkJobListener.java:180)\n\tat io.openlineage.flink.OpenLineageFlinkJobListener.onJobSubmitted(OpenLineageFlinkJobListener.java:156)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.lambda$executeAsync$12(StreamExecutionEnvironment.java:2099)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2099)\n\tat org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:188)\n\tat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:119)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)\n\tat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:345)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)\n\tat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)\n\tat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)\n\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:01:52
+
+

*Thread Reply:* I think it will be a limitation of our integration then, at least until https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener - the way we're integrating with Flink requires it to be able to access execution results +https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/app/src/main/java/io/openlineage/flink/OpenLineageFlinkJobListener.java#L[…]6

+ +

not sure if we can somehow work around this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:04:09
+
+

*Thread Reply:* with that flip we wouldnt need execution.attached?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:04:58
+
+

*Thread Reply:* Nope - it would add different mechanism to integrate with Flink other than JobListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:09:38
+
+

*Thread Reply:* Could a workaround be, instead of having the http tranport, sending to kafka and have a java/python client writing the events to marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:10:30
+
+

*Thread Reply:* because i just tried with executtion.attached to false and with console transport, i just receive the event for start but no errors. not sure if thats the only event needed in marquez to build a dag

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:16:42
+
+

*Thread Reply:* also, wondering if the event actually reached marquez, why wouldnt the job dag be showned?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:16:52
+
+

*Thread Reply:* its the same start event i have received when running localy

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:17:15
+
+

*Thread Reply:*

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:25:47
+
+

*Thread Reply:* comparison of marquez receiving event from managed flink on aws (left). to marquez localhost receiving event from local flink. its the same event. however marquez in ec2 is not building dag

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:26:14
+
+

*Thread Reply:* @Maciej Obuchowski is there any other event needed for dag?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:38:48
+
+

*Thread Reply:* > Could a workaround be, instead of having the http tranport, sending to kafka and have a java/python client writing the events to marquez? +I think there are two problems, and the 400 is probably just the followup from the original one - maybe too long stacktrace makes Marquez reject the event? +The original one, the attached one, is the cause why the integration tries to send the FAIL event at the first place

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 10:45:35
+
+

*Thread Reply:* For the error described in message "The Job Result cannot be fetched through the Job Client when in Web Submission.", I feel it is a bug in flink. Which version of flink are you using? @Francisco Morillo

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:02:46
+
+

*Thread Reply:* looking at implementation, it seems to be by design: +/**** + ** A {@link JobClient} that only allows asking for the job id of the job it is attached to. + ** + ** &lt;p&gt;This is used in web submission, where we do not want the Web UI to have jobs blocking threads + ** while waiting for their completion. + **/

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 11:32:51
+
+

*Thread Reply:* Yes, looks like flink code try to fetch the Job Result for the web submission job, thus the exception is raised.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:27:05
+
+

*Thread Reply:* Flink 1.15.2

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:28:00
+
+

*Thread Reply:* But still wouldnt marquez be able to build the dag with the start event?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:28:50
+
+

*Thread Reply:* In Marquez, new dataset version is created when the run completes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:29:14
+
+

*Thread Reply:* but that doesnt show as events in marquez right?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:29:33
+
+

*Thread Reply:* I think that was going to be changed for streaming jobs - right @Paweł Leszczyński? - but not sure if that's already merged

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:33:34
+
+

*Thread Reply:* in latest marquez version?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:41:52
+
+

*Thread Reply:* is this the right transport url? props.put("openlineage.transport.url","http://localhost:5000/api/v1/lineage");

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:42:36
+
+

*Thread Reply:* because i was able to see streaming jobs in marquez when running locally, as well as having a flink local job writing to the marquez on ec2. its as the dataset and job doesnt get created in marquez from the event

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 13:05:28
+
+

*Thread Reply:* I tried with flink 1.18 and same. i receive the start event but the job and dataset are not created in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 13:15:59
+
+

*Thread Reply:* If i try locally and set execution.attached to false it does work. So it seems that the main issue is that openlineage doesnt work with flink job submission through web ui

+ + + +
+ 👀 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 16:54:20
+
+

*Thread Reply:* From my understanding until now, set execution.attched = false mitigates the exception in flink (at least from the flink code, it is the logic). On the other hand, the question goes to when to build the dag when receive events. @Paweł Leszczyński From our org, we changed the default behavior. The flink listener will periodically send running events out. Once the lineage backend receive the running event, a new dag will be created.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 17:00:26
+
+

*Thread Reply:* How can i configure that?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 17:02:00
+
+

*Thread Reply:* To send periodical running event, some changes are needed in the open lineage flink lib. Let's wait for @Paweł Leszczyński for concrete plan. I am glad to create a PR for this.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 17:05:28
+
+

*Thread Reply:* im still wondering why the dag was not created in marquez, unless there are some other events that open lineage sends for it to build the job and dataset that if submitted through webui it doesnt work. I will try to replicate in EMR

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 20:37:03
+
+

*Thread Reply:* Looking at marquez logs, im seeing this

+ +

arquez.api.OpenLineageResource: Unexpected error while processing request +! java.lang.IllegalArgumentException: namespace '<kafka://b-1.mskflinkopenlineage.fdz2z7.c22.kafka.us-east-1.amazonaws.com:9092>,b-3.mskflinkopenlineage.fdz2z7.c22.kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.fdz2z7.c22.kafka.us_east_1.amazonaws.com:9092' must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), at (@), plus (+), dashes (-), colons (:), equals (=), semicolons (;), slashes (/) or dots (.) with a maximum length of 1024 characters.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 20:37:38
+
+

*Thread Reply:* can marquez work with msk?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:43:06
+
+

*Thread Reply:* The graph on Marquez side should be present just after sending START event, once the START contains information about input/output datasets. Commas are the problem here and we should modify Flink integration to separate broker list by a semicolon.

+ + + +
+ ✅ Francisco Morillo +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-03 05:50:05
+
+

Hi all, I've opened a PR for the dbt-ol script. We've noticed that the script doesn't transparently return/exit the exit code of the child dbt process. This makes it hard for the parent process to tell if the underlying workflow succeeded or failed - in the case of Airflow, the parent DAG will mark the job as succeeded even if it actually failed. Let me know if you have thought/comments (cc @Arnab Bhattacharyya)

+
+ + + + + + + +
+
Labels
+ integration/dbt +
+ + + + + + + + + + +
+ + + +
+ ❤️ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tristan GUEZENNEC -CROIX- + (tristan.guezennec@decathlon.com) +
+
2024-04-04 04:41:36
+
+

*Thread Reply:* @Sophie LY FYI

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-03 06:33:34
+
+

Is there a timeline for the 1.11.0 release? Now that the dbt-ol fix has been merged we may either wait for the release or temporarily point to main

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 06:34:09
+
+

*Thread Reply:* I think it’s going to be today or really soon. cc: @Michael Robinson

+ + + +
+ 🎉 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:37:45
+
+

*Thread Reply:* would be great if we could fix the unknown facet memory issue in this release, I think @Paweł Leszczyński @Damien Hawes are working on it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:38:02
+
+

*Thread Reply:* I think this is a critical kind of bug

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:39:27
+
+

*Thread Reply:* Yeah, it's a tough-to-figure-out-where-the-fix-should-be kind of bug.

+ + + +
+ 😨 Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:39:56
+
+

*Thread Reply:* The solution is simple, at least in my mind. If spark_unknown is disabled, don't accumulate state.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:40:11
+
+

*Thread Reply:* i think we should go first with unknown entry facet as it has bigger impact

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:40:12
+
+

*Thread Reply:* if there's no better fast idea, just disable that facet for now?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:26
+
+

*Thread Reply:* It doesn't matter if the facet is disabled or not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:38
+
+

*Thread Reply:* The UnknownEntryFacetListener still accumulates state

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:40:48
+
+

*Thread Reply:* @Damien Hawes will you be able to prepare this today/tomorrow?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:40:58
+
+

*Thread Reply:* disable == comment/remove code related to it, together with UnknownEntryFacetListener 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:59
+
+

*Thread Reply:* I'm working on it today

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:41:01
+
+

*Thread Reply:* in this case 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:41:31
+
+

*Thread Reply:* You're proposing to rip the code out completely?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:42:02
+
+

*Thread Reply:* at least for this release - I think it's better to release code without it and without memory bug, rather than having it bugged as it is

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:42:06
+
+

*Thread Reply:* The only place where I see it being applied is here:

+ +

``` private <L extends LogicalPlan> QueryPlanVisitor<L, D> asQueryPlanVisitor(T event) { + AbstractQueryPlanDatasetBuilder<T, P, D> builder = this; + return new QueryPlanVisitor<L, D>(context) { + @Override + public boolean isDefinedAt(LogicalPlan x) { + return builder.isDefinedAt(event) && isDefinedAtLogicalPlan(x); + }

+ +
  @Override
+  public List&lt;D&gt; apply(LogicalPlan x) {
+    unknownEntryFacetListener.accept(x);
+    return builder.apply(event, (P) x);
+  }
+};
+
+ +

}```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:42:11
+
+

*Thread Reply:* come on, this should be few lines of change

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:42:17
+
+

*Thread Reply:* Inside: AbstractQueryPlanDatasetBuilder

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:42:21
+
+

*Thread Reply:* once we know what it is

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:42:32
+
+

*Thread Reply:* it's useful in some narrow debug cases, but the memory bug potentially impacts all

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:43:15
+
+

*Thread Reply:* openLineageContext + .getQueryExecution() + .filter(qe -&gt; !FacetUtils.isFacetDisabled(openLineageContext, "spark_unknown")) + .flatMap(qe -&gt; unknownEntryFacetListener.build(qe.optimizedPlan())) + .ifPresent(facet -&gt; runFacetsBuilder.put("spark_unknown", facet)); +this should always clean the listener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:19
+
+

*Thread Reply:* @Paweł Leszczyński - every time AbstractQueryPlanDatasetBuilder#apply is called, the UnknownEntryFacetListener is invoked

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:43:38
+
+

*Thread Reply:* the code is within OpenLineageRunEventBuilder

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:50
+
+

*Thread Reply:* @Paweł Leszczyński - it will only clean the listener, if spark_unknown is enabled

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:56
+
+

*Thread Reply:* because of that filter step

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:44:11
+
+

*Thread Reply:* but the listener still accumulates state, regardless of that snippet you shared

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:44:12
+
+

*Thread Reply:* yes, and we need to modify it to always clean

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:45:45
+
+

*Thread Reply:* We have a difference in understanding here, I think.

+ +
  1. If spark_unknown is disabled, the UnknownEntryFacetListener still accumulates state. Your proposed change will not clean that state.
  2. If spark_unknown is enabled, well, sometimes we get StackOverflow errors due to infinite recursion during serialisation.
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:46:35
+
+

*Thread Reply:* just to get a bit out from particular solution: I would love if we could either release with

+ +
  1. a proper fix that won't accumulate memory if facet is disabled, and clean up it it's not
  2. have that facet removed for now +I don't want to have a release now that will contain this bug, because we're trying to do a "good" solution but have no time to do it properly for the release
  3. +
+ + + +
+ 👍 Damien Hawes +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:46:57
+
+

*Thread Reply:* I think the impact of this bug is big

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:47:24
+
+

*Thread Reply:* My opinion is that perhaps the OpenLineageContext object needs to be extended to hold which facets are enabled / disabled.

+ + + +
+ ➕ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:47:52
+
+

*Thread Reply:* This way, things that inherit from AbstractQueryPlanDatasetBuilder can check, should they be a no-op or not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:48:36
+
+

*Thread Reply:* Or, +```private <L extends LogicalPlan> QueryPlanVisitor<L, D> asQueryPlanVisitor(T event) { + AbstractQueryPlanDatasetBuilder<T, P, D> builder = this; + return new QueryPlanVisitor<L, D>(context) { + @Override + public boolean isDefinedAt(LogicalPlan x) { + return builder.isDefinedAt(event) && isDefinedAtLogicalPlan(x); + }

+ +
@Override
+public List&lt;D&gt; apply(LogicalPlan x) {
+  unknownEntryFacetListener.accept(x);
+  return builder.apply(event, (P) x);
+}
+
+ +

}; +}``` +This needs to be changed

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:48:40
+
+

*Thread Reply:* @Damien Hawes could u look at this again https://github.com/OpenLineage/OpenLineage/pull/2557/files ?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:49:27
+
+

*Thread Reply:* i think clearing visitedNodes within populateRun should solve this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:51:01
+
+

*Thread Reply:* the solution is (1) don't store logical plans, but their string representation (2) clear what you collected after populating a facet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:51:18
+
+

*Thread Reply:* even if it works, I still don't really like it because we accumulate state in asQueryPlanVisitor just to clear it later

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:51:19
+
+

*Thread Reply:* It works, but I'm still annoyed that UnknownEntryFacetListener is being called in the first place

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:51:46
+
+

*Thread Reply:* also i think in case of really large plans it could be an issue still?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:53:06
+
+

*Thread Reply:* why @Maciej Obuchowski?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:55:47
+
+

*Thread Reply:* we've seen >20MB serialized logical plans, and that's what essentially treeString does if I understand it correctly

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:56:56
+
+

*Thread Reply:* and then the serialization can potentially still take some time...

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:01:19
+
+

*Thread Reply:* where did you find treeString serializes a plan?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:05:44
+
+

*Thread Reply:* treeString is used by default toString method of TreeNode, so would be super weird if they serialized entire object within it. I couldn't find any of such code within Spark implementation

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:19:02
+
+

*Thread Reply:* I also remind you, that there is the problem with the job metrics holder as well

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:19:17
+
+

*Thread Reply:* That will also, eventually, cause an OOM crash

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:27:41
+
+

*Thread Reply:* So, I agreeUnknownEntryFacetListener code should not be called if a facet is disabled. I agree we should have another PR and fix for job metrics.

+ +

The question is: what do we want to have shipped within the next release? Do we want to get rid of static member that acumulates all the logical plans (which is cleaner approach) or just clear it once not needed anymore? I think we'll need to clear it anyway in case someone turns the unkown facet feature on.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:39:09
+
+

*Thread Reply:* In my opinion, the approach for the immediate release is to clear the plans. Though, I'd like tests that prove it works.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 08:10:02
+
+

*Thread Reply:* @Damien Hawes so let's go with Paweł's PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 08:24:04
+
+

*Thread Reply:* So, prooving this helps would be great. One option would be to prepare integration test that runs something and verifies later on that private static map is empty. Another, a way nicer, would be to write a code that generates a few MB dataset reads into memory and saves into a file, and then within integration tests code runs something like https://github.com/jerolba/jmnemohistosyne to see memory consumption of classess we're interested in (not sure how difficult this is to write such thing)

+ +

This could be also beneficial to prevent similar issues in future and solve job metrics issue.

+
+ + + + + + + +
+
Stars
+ 15 +
+ +
+
Language
+ Java +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:37
+
+

*Thread Reply:* @Damien Hawes @Paweł Leszczyński would be great to clarify if you're working on it now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:43
+
+

*Thread Reply:* as this blocks release

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:47
+
+

*Thread Reply:* fyi @Michael Robinson

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 09:48:12
+
+

*Thread Reply:* I can try to prove that the PR I propoposed brings improvement. However, if Damien wants to work on his approach targetting this release, I am happy to hand it over.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:20:24
+
+

*Thread Reply:* I'm not working on it at the moment. I think Pawel's approach is fine for the time being.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:20:31
+
+

*Thread Reply:* I'll focus on the JobMetricsHolder problem

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:24:54
+
+

*Thread Reply:* Side note: @Paweł Leszczyński @Maciej Obuchowski - are you able to give any guidance why the UnknownEntryFacetListener was implemented that way, as opposed to just examining the event in a stateless manner?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:18:28
+
+

*Thread Reply:* OK. @Paweł Leszczyński @Maciej Obuchowski - I think I found the memory leak with JobMetricsHolder. If we receive an event like SparkListenerJobStart, but there isn't any dataset in it, it looks like we're storing the metrics, but we never get rid of them.

+ + + +
+ 😬 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:21:50
+
+

*Thread Reply:* Here's the logs

+ +
+ + + + + + + +
+ + +
+ 🙌 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:36:50
+
+

*Thread Reply:* > Side note: @Paweł Leszczyński @Maciej Obuchowski - are you able to give any guidance why the UnknownEntryFacetListener was implemented that way, as opposed to just examining the event in a stateless manner? +It's one of the older parts of codebase, implemented mostly in 2021 by person no longer associated with the project... hard to tell to be honest 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:37:52
+
+

*Thread Reply:* but I think we have much more freedom to modify it, as it's not standarized or user facing feature

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:47:02
+
+

*Thread Reply:* to solve stageMetrics issue - should they always be a separate Map per job that's associated with jobId allowing it to be easily cleaned... but there's no jobId on SparkListenerTaskEnd

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:16
+
+

*Thread Reply:* Nah

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:17
+
+

*Thread Reply:* Actually

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:21
+
+

*Thread Reply:* Its simpler than that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:36
+
+

*Thread Reply:* The bug is here:

+ +

public void cleanUp(int jobId) { + Set&lt;Integer&gt; stages = jobStages.remove(jobId); + stages = stages == null ? Collections.emptySet() : stages; + stages.forEach(jobStages::remove); + }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:51
+
+

*Thread Reply:* We remove from jobStages N + 1 times

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:48:14
+
+

*Thread Reply:* JobStages is supposed to carry a mapping from Job -&gt; Stage

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:48:30
+
+

*Thread Reply:* and stageMetrics a mapping from Stage -&gt; TaskMetrics

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:49:00
+
+

*Thread Reply:* ah yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:03
+
+

*Thread Reply:* Here, we remove the job from jobStages, and obtain the associated stages, and then we use those stages to remove from jobStages again

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:11
+
+

*Thread Reply:* It's a "huh?" moment

+ + + +
+ 😂 Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:53
+
+

*Thread Reply:* The amount of logging I added, just to see this, was crazy

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:50:46
+
+

*Thread Reply:* public void cleanUp(int jobId) { + Set&lt;Integer&gt; stages = jobStages.remove(jobId); + stages = stages == null ? Collections.emptySet() : stages; + stages.forEach(stageMetrics::remove); + } +so it's just jobStages -> stageMetrics here, right?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:50:57
+
+

*Thread Reply:* Yup

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:51:09
+
+

*Thread Reply:* yeah it looks so obvious after seeing that 😄

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:51:40
+
+

*Thread Reply:* I even wrote a separate method to clear the stageMetrics map

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:51:41
+
+

*Thread Reply:* it was there since 2021 in that form 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:00
+
+

*Thread Reply:* and placed it in the same locations as the cleanUp method in the OpenLineageSparkListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:09
+
+

*Thread Reply:* Wrote a unit test

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:12
+
+

*Thread Reply:* It fails

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:17
+
+

*Thread Reply:* and I was like, "why?"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:25
+
+

*Thread Reply:* Investigate further, and then I noticed this method

+ + + +
+ 😄 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 12:33:42
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 14:39:06
+
+

*Thread Reply:* Has Damien's PR unblocked the release?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 14:39:33
+
+

*Thread Reply:* No, we need one more from Paweł

+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-04 10:37:42
+
+

*Thread Reply:* OK. Pawel's PR has been merged @Michael Robinson

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-04 12:12:28
+
+

*Thread Reply:* Given these developments, I'ld like to call for a release of 1.11.0 to happen today, unless there are any objections.

+ + + +
+ ➕ Harel Shein, Jakub Dardziński +
+ +
+ 👀 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 12:28:38
+
+

*Thread Reply:* Changelog PR is RFR: https://github.com/OpenLineage/OpenLineage/pull/2574

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 14:29:04
+
+

*Thread Reply:* CircleCI has problems

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:12:27
+
+

*Thread Reply:* ```self = <tests.conftest.DagsterRunLatestProvider object at 0x7fcd84faed60> +repositoryname = 'testrepo'

+ +
def get_instance(self, repository_name: str) -&gt; DagsterRun:
+
+ +

> from dagster.core.remoterepresentation.origin import ( + ExternalJobOrigin, + ExternalRepositoryOrigin, + InProcessCodeLocationOrigin, + ) +E ImportError: cannot import name 'ExternalJobOrigin' from 'dagster.core.remoterepresentation.origin' (/home/circleci/.pyenv/versions/3.8.19/lib/python3.8/site-packages/dagster/core/remote_representation/origin.py)

+ +

tests/conftest.py:140: ImportError```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:12:39
+
+

*Thread Reply:* &gt;&gt;&gt; from dagster.core.remote_representation.origin import ( +... ExternalJobOrigin, +... ExternalRepositoryOrigin, +... InProcessCodeLocationOrigin, +... ) +Traceback (most recent call last): + File "&lt;stdin&gt;", line 1, in &lt;module&gt; + File "&lt;frozen importlib._bootstrap&gt;", line 1176, in _find_and_load + File "&lt;frozen importlib._bootstrap&gt;", line 1138, in _find_and_load_unlocked + File "&lt;frozen importlib._bootstrap&gt;", line 1078, in _find_spec + File "/home/blacklight/git_tree/OpenLineage/venv/lib/python3.11/site-packages/dagster/_module_alias_map.py", line 36, in find_spec + assert base_spec, f"Could not find module spec for {base_name}." +AssertionError: Could not find module spec for dagster._core.remote_representation. +&gt;&gt;&gt; from dagster.core.host_representation.origin import ( +... ExternalJobOrigin, +... ExternalRepositoryOrigin, +... InProcessCodeLocationOrigin, +... ) +&gt;&gt;&gt; ExternalJobOrigin +&lt;class 'dagster._core.host_representation.origin.ExternalJobOrigin'&gt;

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:13:07
+
+

*Thread Reply:* It seems that the parent module should be dagster.core.host_representation.origin, not dagster.core.remote_representation.origin

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:14:55
+
+

*Thread Reply:* did you rebase? for >=1.6.9 it’s dagster.core.remote_representation.origin, should be ok

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:18:06
+
+

*Thread Reply:* Indeed, I was just looking at https://github.com/dagster-io/dagster/pull/20323 (merged 4 weeks ago)

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:18:43
+
+

*Thread Reply:* I did a pip install of the integration from main and it seems to install a previous version though:

+ +

&gt;&gt;&gt; dagster.__version__ +'1.6.5'

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:18:59
+
+

*Thread Reply:* try --force-reinstall maybe

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:19:08
+
+

*Thread Reply:* it works fine for me, CI doesn’t crash either

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:20:09
+
+

*Thread Reply:* https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/10020/workflows/4d3a33b4-47ef-4cf6-b6de-1bb95611fad7/jobs/200011 (although the ImportError seems to be different from mine)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:20:53
+
+

*Thread Reply:* huh, how didn’t I see this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:21:30
+
+

*Thread Reply:* I think we should limit upper version of dagster, it’s not even really maintained

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:28:14
+
+

*Thread Reply:* I've also just noticed that ExternalJobOrigin and ExternalRepositoryOrigin have been renamed to RemoteJobOrigin and RemoteRepositoryOrigin on 1.7.0 - and that's apparently the version the CI installed

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:28:32
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2579

+ + + +
+ 👍 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:24:26
+
+

Hey 👋 +When I am running TrinoOperator on Airflow 2.7 I am getting this: +[2024-04-03, 11:10:44 UTC] {base.py:162} WARNING - OpenLineage provider method failed to extract data from provider. +[2024-04-03, 11:10:44 UTC] {manager.py:276} WARNING - Extractor returns non-valid metadata: None +I've upgraded apache-airflow-providers-openlineage to 1.6.0 (maybe it is too new for Airflow 2.7 version?). +And due to the warning I am ending with empty input/output facets... Seems that it is not capable to connect to Trino and extract table structure... When I tried on our prod Airflow version (2.6.3) and openlineage-airflow it was capable to connect and extract table structure, but not to do the column level lineage mapping.

+ +

Any input would be very helpful. +Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:28:29
+
+

*Thread Reply:* Tried with default version of OL plugin that comes with 2.7 Airflow (1.0.1) so result was the same

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 07:31:55
+
+

*Thread Reply:* Could you please enable DEBUG logs in Airflow and provide them?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:42:14
+
+

*Thread Reply:*

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 07:50:30
+
+

*Thread Reply:* thanks +it seems like only the beginning of the logs. I’m assuming it fails on complete event

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:56:00
+
+

*Thread Reply:* I am sorry! This is the full log

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:00:03
+
+

*Thread Reply:* What I also just realised that we have our own TrinoOperator implementation, which inherits from SQLExecuteQueryOperator (same as original TrinoOperator)... So maybe inlets and outlets aren't being set due to that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:00:52
+
+

*Thread Reply:* yeah, it could interfere

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:01:04
+
+

*Thread Reply:* But task was rather simple: +create_table_apps_log_test = TrinoOperator( + task_id=f"create_table_test", + sql=""" + CREATE TABLE if not exists mytable as + SELECT app_id, msid, instance_id from table limit 1 + """ +)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:01:26
+
+

*Thread Reply:* do you use some other hook to connect to Trino?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:03:12
+
+

*Thread Reply:* Just checked. So we have our own hook to connect to Trino... that inherits from TrinoHook 🙄

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:06:05
+
+

*Thread Reply:* hard to say, you could check https://github.com/apache/airflow/blob/main/airflow/providers/trino/hooks/trino.py#L252 to see how integration collects basic information how to retrieve connection

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:10:24
+
+

*Thread Reply:* Just thinking why did it worked with Airflow 2.6.3 and openlineage-airflow package, seems that it was accessing Trino differently

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:10:40
+
+

*Thread Reply:* But anyways, will try to look more into it. Thanks for tips!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:12:13
+
+

*Thread Reply:* please let me know your findings, it might be some bug introduced in provider package

+ + + +
+ 👍 Mantas Mykolaitis +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:07
+
+

Looking for some help with spark and the “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.” We are not getting any openLineage data in Cloudwatch nor in sparkHistoryLogs. +(more details in thread - should I be making this into a github issue instead?)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:29
+
+

*Thread Reply:* The python code:

+ +

import sys +from awsglue.transforms import ** +from awsglue.utils import getResolvedOptions +from pyspark.context import SparkContext +from pyspark.conf import SparkConf +from awsglue.context import GlueContext +from awsglue.job import Job

+ +

conf = SparkConf() +conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")\ + .set("spark.jars.packages","io.openlineage:openlineage_spark:1.10.2")\ + .set("spark.openlineage.version","v1")\ + .set("spark.openlineage.namespace","OL_EXAMPLE_DN")\ + .set("spark.openlineage.transport.type","console") +## @params: [JOB_NAME] +args = getResolvedOptions(sys.argv, ['JOB_NAME'])

+ +

sc = SparkContext.getOrCreate(conf=conf) +glueContext = GlueContext(sc) +spark = glueContext.spark_session +job = Job(glueContext) +job.init(args['JOB_NAME'], args) +df = spark.read.format("csv").option("header","true").load("<s3-folder-path>") +df.write.format("csv").option("header","true").save("<s3-folder-path>",mode='overwrite') +job.commit()

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:32
+
+

*Thread Reply:* Nothing appears in cloudwatch, or in the sparkHistoryLogs. Here's the jr_runid file from sparkHistoryLogs - it shows that the work was done, but nothing about openlineage or where the spark session was stopped before OL could do anything: +{ + "Event": "SparkListenerApplicationStart", + "App Name": "nativespark-check_python_-jr_<jrid>", + "App ID": "spark-application-0", + "Timestamp": 0, + "User": "spark" +} +{ + "Event": "org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart", + "executionId": 0, + "description": "load at NativeMethodAccessorImpl.java:0", + "details": "org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:498)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:282)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\npy4j.ClientServerConnection.run(ClientServerConnection.java:106)\njava.lang.Thread.run(Thread.java:750)", + "physicalPlanDescription": "== Parsed Logical Plan ==\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#7, None)) > 0)\n +- Project [value#0 AS value#7]\n +- Project [value#0]\n +- Relation [value#0] text\n\n== Analyzed Logical Plan ==\nvalue: string\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#7, None)) > 0)\n +- Project [value#0 AS value#7]\n +- Project [value#0]\n +- Relation [value#0] text\n\n== Optimized Logical Plan ==\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#0, None)) > 0)\n +- Relation [value#0] text\n\n== Physical Plan ==\nCollectLimit 1\n+- **(1) Filter (length(trim(value#0, None)) > 0)\n +- FileScan text [value#0] Batched: false, DataFilters: [(length(trim(value#0, None)) > 0)], Format: Text, Location: InMemoryFileIndex(1 paths)[<s3-csv-file>], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>\n", + "sparkPlanInfo": { + "nodeName": "CollectLimit", + "simpleString": "CollectLimit 1", + "children": [ + { + "nodeName": "WholeStageCodegen (1)", + "simpleString": "WholeStageCodegen (1)", + "children": [ + { + "nodeName": "Filter", + "simpleString": "Filter (length(trim(value#0, None)) > 0)", + "children": [ + { + "nodeName": "InputAdapter", + "simpleString": "InputAdapter", + "children": [ + { + "nodeName": "Scan text ", + "simpleString": "FileScan text [value#0] Batched: false, DataFilters: [(length(trim(value#0, None)) > 0)], Format: Text, Location: InMemoryFileIndex(1 paths)[<s3-csv-file>], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>", + "children": [], + "metadata": { + "Location": "InMemoryFileIndex(1 paths)[<s3-csv-file>]", + "ReadSchema": "struct<value:string>", + "Format": "Text", + "Batched": "false", + "PartitionFilters": "[]", + "PushedFilters": "[]", + "DataFilters": "[(length(trim(value#0, None)) > 0)]" + }, + "metrics": [ + { + "name": "number of output rows from cache", + "accumulatorId": 14, + "metricType": "sum" + }, + { + "name": "number of files read", + "accumulatorId": 15, + "metricType": "sum" + }, + { + "name": "metadata time", + "accumulatorId": 16, + "metricType": "timing" + }, + { + "name": "size of files read", + "accumulatorId": 17, + "metricType": "size" + }, + { + "name": "max size of file split", + "accumulatorId": 18, + "metricType": "size" + }, + { + "name": "number of output rows", + "accumulatorId": 13, + "metricType": "sum" + } + ] + } + ], + "metadata": {}, + "metrics": [] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "number of output rows", + "accumulatorId": 12, + "metricType": "sum" + } + ] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "duration", + "accumulatorId": 11, + "metricType": "timing" + } + ] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "shuffle records written", + "accumulatorId": 9, + "metricType": "sum" + }, + { + "name": "shuffle write time", + "accumulatorId": 10, + "metricType": "nsTiming" + }, + { + "name": "records read", + "accumulatorId": 7, + "metricType": "sum" + }, + { + "name": "local bytes read", + "accumulatorId": 5, + "metricType": "size" + }, + { + "name": "fetch wait time", + "accumulatorId": 6, + "metricType": "timing" + }, + { + "name": "remote bytes read", + "accumulatorId": 3, + "metricType": "size" + }, + { + "name": "local blocks read", + "accumulatorId": 2, + "metricType": "sum" + }, + { + "name": "remote blocks read", + "accumulatorId": 1, + "metricType": "sum" + }, + { + "name": "remote bytes read to disk", + "accumulatorId": 4, + "metricType": "size" + }, + { + "name": "shuffle bytes written", + "accumulatorId": 8, + "metricType": "size" + } + ] + }, + "time": 0, + "modifiedConfigs": {} +} +{ + "Event": "SparkListenerApplicationEnd", + "Timestamp": 0 +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:06:04
+
+

*Thread Reply:* I think this is related to job.commit() that probably stops context underneath

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:06:33
+
+

*Thread Reply:* This is probably the same bug: https://github.com/OpenLineage/OpenLineage/issues/2513 but manifests differently

+
+ + + + + + + +
+
Labels
+ integration/spark +
+ +
+
Comments
+ 14 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-04-03 09:45:59
+
+

*Thread Reply:* can you try without the job.commit()?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 09:54:39
+
+

*Thread Reply:* Sure!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 09:56:31
+
+

*Thread Reply:* BTW it makes sense that if the spark listener is disabled, that the openlineage integration shouldn’t even try. (If we removed that line, it doesn’t feel like the integration would actually work….)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:57:51
+
+

*Thread Reply:* you mean removing this? +conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")\ +if you don't set it, none of our code is actually being loaded

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-04-03 09:59:25
+
+

*Thread Reply:* i meant, removing the job.init and job.commit for testing purposes. glue should work without that,

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 12:47:03
+
+

*Thread Reply:* We removed job.commit, same error. Should we also remove job.init?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 12:48:06
+
+

*Thread Reply:* Won’t removing this change the functionality? +job.init(args[‘JOB_NAME’], args)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 13:22:11
+
+

*Thread Reply:* interesting - maybe something else stops the job explicitely underneath on Glue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 13:38:02
+
+

*Thread Reply:* Will have a look.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
DEEVITH NAGRAJ + (deevithraj435@gmail.com) +
+
2024-04-03 23:09:10
+
+

*Thread Reply:* Hi all, +I'm working with Sheeri on this, so couple of queries,

+ +
  1. tried to set("spark.openlineage.transport.location","/sample.txt>") then the job succeeds but no output in the sample.txt file. (however there are some files created in /sparkHistoryLogs and /sparkHistoryLogs/output), I dont see the OL output file here.
    +2.set("spark.openlineage.transport.type","console") the job fails with “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.”

  2. if we are using http as transport.type, then can we use basic auth instead of api_key?

  3. +
+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 05:32:05
+
+

*Thread Reply:* > 3. if we are using http as transport.type, then can we use basic auth instead of api_key? +Would be good to add that to HttpTransport 🙂

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 05:33:16
+
+

*Thread Reply:* > 1. tried to set("spark.openlineage.transport.location","<|s3:<s3bucket>/sample.txt>") then the job succeeds but no output in the sample.txt file. (however there are some files created in /sparkHistoryLogs and /sparkHistoryLogs/output), I dont see the OL output file here.
+Yeah, FileTransport does not work with object storage - it needs to be regular filesystem. I don't know if we can make it work without pulling a lot of dependencies and making it significantly more complex - but of course we'd like to see such contribution

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 08:11:44
+
+

*Thread Reply:* @DEEVITH NAGRAJ yes, that’s why the PoC is to have the sparklineage use the transport type of “console” - we can’t save to files in S3.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 08:12:54
+
+

*Thread Reply:* @DEEVITH NAGRAJ if we can get it to work in console, and CloudWatch shows us openlineage data, then we can change the transport type to an API and set up fluentd to collect the data.

+ +

BTW yesterday another customer got it working in console, and Roderigo from this thread also saw it working in console, so we know it does work in general 😄

+ + + +
+ 🙌 DEEVITH NAGRAJ +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
DEEVITH NAGRAJ + (deevithraj435@gmail.com) +
+
2024-04-04 11:47:20
+
+

*Thread Reply:* yes Sheeri, I agree we need to get it to work in the console.I dont see anything in the cloudwatch, and the error is thrown when tried to set("spark.openlineage.transport.type","console") the job fails with “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.”

+ +

do we need to specify scala version in .set("spark.jars.packages","io.openlineage:openlineagespark:1.10.2") like .set("spark.jars.packages","io.openlineage:openlineagespark_2.13:1.10.2")? is that causing the issue?

+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 14:03:37
+
+

*Thread Reply:* Awesome! We’ve got it so the job succeeds when we set the transport type to “console”. Anyone have any tips on where to find it in CloudWatch? the job itself has a dozen or so different logs and we’re clicking all of them, but maybe there’s an easier way?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:15:27
+
+

Hi everyone, I've started 2 weeks ago to implement openLineage in our solution. But I've run into some problems and quite frankly I don't understand what I'm doing wrong. +The situation is, we are using Azure Synapse with notebooks and we want to pick up the data lineage. I have found a lot of documentation about databricks in combination with Openlineage. But there is not much documentation with Synapse in combination with Openlineage. I've installed the newest library "openlineage-1.10.2" in the Synapse Apache Spark packages (so far so good). The next step I did was to configure the Apache Spark configuration, based on a blog I’ve found I filled in the following properties: +spark.extraListeners - io.openlineage.spark.agent.OpenLineageSparkListener +spark.openlineage.host – <https://functionapp.azurewebsites.net/api/function> +spark.openlineage.namespace – synapse name +spark.openlineage.url.param.code – XXXX +spark.openlineage.version – 1

+ +

I’m not sure if the namespace is good, I think it's the name of synapse? But the moment I want to run the Synapse notebook (creating a simple dataframe) it shows me an error

+ +

Py4JJavaError Traceback (most recent call last) Cell In [5], line 1 ----&gt; 1 df = spark.read.load('<abfss://bronsedomein1@xxxxxxxx.dfs.core.windows.net/adventureworks/vendors.parquet>', format='parquet') **2** display(df) +Py4JJavaError: An error occurred while calling o4060.load. +: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

+ +

I can’t figure out what I’m doing wrong, does somebody have a clue?

+ +

Thanks, +Mark

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:35:46
+
+

*Thread Reply:* this error seems unrelated to openlineage to me, can you try removing all the openlineage related properties from the config and testing this out just to rule that out?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:39:30
+
+

*Thread Reply:* Hey Harel,

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:40:49
+
+

*Thread Reply:* Yes I removed all the related openlineage properties. And (ofcourse 😉 ) it's working fine. But the moment I fill in the Properties as mentiond above, it gives me the error.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:45:41
+
+

*Thread Reply:* thanks for checking, wanted to make sure. 🙂

+ + + +
+ 👍 Mark de Groot +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:48:03
+
+

*Thread Reply:* can you try only setting +spark.extraListeners = io.openlineage.spark.agent.OpenLineageSparkListener +spark.jars.packages = io.openlineage:openlineage_spark_2.12:1.10.2 +spark.openlineage.transport.type = console +?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:01:20
+
+

*Thread Reply:* @Mark de Groot are you stopping the job using spark.stop() or similar command?

+ + + +
+ 👍 Mark de Groot +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 12:18:21
+
+

*Thread Reply:* So when i Run the default value in Synapse

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 12:19:49
+
+

*Thread Reply:* Everything is working fine, but when I use the following properties +I'm getting an error, when trying e.q to create a Dataframe.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 11:23:31
+
+

@channel + Accenture+Confluent's Open Standards for Data Lineage roundtable is happening on April 25th, featuring: +• Kai Waehner (Confluent) +• @Mandy Chessell (Egeria) +• @Julien Le Dem (OpenLineage) +• @Jens Pfau (Google Cloud) +• @Ernie Ostic (Manta/IBM) +• @Sheeri Cabral (Collibra) +• Austin Kronz (Atlan) +• @Luigi Scorzato (moderator, Accenture) +Not to be missed! Register at the link.

+
+
events.confluent.io
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 🔥 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bassim EL Baroudi + (bassim.elbaroudi@gmail.com) +
+
2024-04-03 12:58:12
+
+

Hi everyone, +I'm trying to launch a spark job with integration with openlineage. The version of spark is 3.5.0. +The configuration used:

+ +

spark.jars.packages=io.openlineage:openlineage-spark_2.12:1.10.2 +spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener +spark.openlineage.transport.url=http://marquez.dcp.svc.cluster.local:8087 +spark.openlineage.namespace=pyspark +spark.openlineage.transport.type=http +spark.openlineage.facets.disabled="[spark.logicalPlan;]" +spark.openlineage.debugFacet=enabled

+ +

the spark job exits with the following error: +java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +24/04/03 13:23:39 INFO SparkContext: SparkContext is stopping with exitCode 0. +24/04/03 13:23:39 ERROR Utils: throw uncaught fatal error in thread spark-listener-group-shared +java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:29:34
+
+

*Thread Reply:* Hey @Bassim EL Baroudi, what environnment are you running the Spark job? Is this some real-life production job or are you able to provide a code snippet which reproduces it?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 03:31:29
+
+

*Thread Reply:* Do you get any OpenLineage events like START events and see this exception at the end of job or does it occur at the begining resulting in no events emitted?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 16:16:41
+
+

@channel +This month’s TSC meeting is next Wednesday the 10th at 9:30am PT. +On the tentative agenda (additional items TBA): +• announcements + ◦ upcoming events including the Accenture+Confluent roundtable on 4/25 +• recent release highlights +• discussion items + ◦ supporting job-to-job, as opposed to job-dataset-job, dependencies in the spec + ◦ improving naming +• open discussion +More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? Reply here or DM me to be added to the agenda.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+ 👍 Paweł Leszczyński, Sheeri Cabral (Collibra), Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 22:19:15
+
+

Hi! How can i pass multiple kafka brokers when using with Flink? It appears marquez doesnt allow to have namespaces with commas.

+ +

namespace 'roker1,broker2,broker3' must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), at (@), plus (+), dashes (-), colons (:), equals (=), semicolons (;), slashes (/) or dots (.) with a maximum length of 1024 characters.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:36:19
+
+

*Thread Reply:* Kafka dataset naming already has an open issue -> https://github.com/OpenLineage/OpenLineage/issues/560

+ +

I think the problem you raised deserves a separate one. Feel free to create it. I. think we can still modify broker separator to semicolon.

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 17:46:31
+
+

FYI I've moved https://github.com/OpenLineage/OpenLineage/pull/2489 to https://github.com/OpenLineage/OpenLineage/pull/2578 - I mistakenly included a couple of merge commits upon git rebase --signoff. Hopefully the tests should pass now (there were a couple of macro templates that still reported the old arguments). Is it still in time to be squeezed inside 1.11.0? It's not super-crucial (for us at least), since we already have copied the code of those macros in our operators implementation, but since the same fix has already been merged on the Airflow side it'd be good to keep things in sync (cc @Maciej Obuchowski @Kacper Muda)

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:43:05
+
+

*Thread Reply:* The tests are passing now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-05 01:37:57
+
+

I wanted to ask if there are any roadmap to adding more support for flink sources and sinks to openlineage for example: +• Kinesis +• Hudi +• Iceberg SQL +• Flink CDC +• Opensearch +or how one can contribute to those?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-04-05 02:48:41
+
+

*Thread Reply:* Hey, if you feel like contributing, take a look at our contributors guide 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 07:14:55
+
+

*Thread Reply:* I think most important think on Flink side is working with Flink community on implementing https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener - as this allows us to move the implementation to the dedicated connectors

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
dolfinus + (martinov_m_s_@mail.ru) +
+
2024-04-05 09:47:22
+
+

👋 Hi everyone!

+ + + +
+ 👋 Michael Robinson, Jakub Dardziński, Harel Shein, Damien Hawes +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 09:56:35
+
+

*Thread Reply:* Hello 👋

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-05 11:30:01
+
+

@channel +We released OpenLineage 1.11.3, featuring a new package to support built-in lineage in Spark extensions and a telemetry mechanism in the Spark integration, among many other additions and fixes. +Additions +• Common: add support for SCRIPT-type jobs in BigQuery #2564 @kacpermuda +• Spark: support for built-in lineage extraction #2272 @pawel-big-lebowski +• Spark/Java: add support for Micrometer metrics #2496 @mobuchowski +• Spark: add support for telemetry mechanism #2528 @mobuchowski +• Spark: support query option on table read #2556 @mobuchowski +• Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @Ruihua98 +• Spec: add fileCount to dataset stat facets #2562 @dolfinus +There were also many bug fixes -- please see the release notes for details. +Thanks to all the contributors with a shout out to new contributor @dolfinus (who contributed 5 PRs to the release and already has 4 more open!) and @Maciej Obuchowski and @Jakub Dardziński for the after-hours CI fixes! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.11.3 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.10.2...1.11.3 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage|https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🔥 Maciej Obuchowski, Jorge, taosheng shi, Ricardo Gaspar +
+ +
+ 🚀 Maciej Obuchowski, taosheng shi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:21:34
+
+

👋 Hi everyone!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:22:10
+
+

*Thread Reply:* This is Taosheng from GitData Labs (https://gitdata.ai/) and We are building data versioning tool for responsible AL/ML:

+ +

An Git-like version control file system for data lineage & data collaboration. +https://github.com/GitDataAI/jiaozifs

+
+
gitdata.ai
+ + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Website
+ <https://jiaozifs.com> +
+ +
+
Stars
+ 34 +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:23:38
+
+

*Thread Reply:* hello 👋

+ + + +
+ 👋 taosheng shi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:26:56
+
+

*Thread Reply:* I came across OpenLineage on Google I would be able to contribute with our products & skills. I Was thinking maybe could start sharing some of them here, and seeing if there is something that feels like it could be interesting to co-build on/through OpenLineage and co-market together.

+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:27:06
+
+

*Thread Reply:* Would somebody be open to discuss any open opportunities for us together?

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-05 14:55:20
+
+

*Thread Reply:* 👋 welcome and thanks for joining!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 03:02:10
+
+

Hi Everyone ! Wanted to implement a cross stack data lineage across Flink and Spark but it seems that Iceberg Table gets registered asdifferent datasets in both. Spark at the top Flink at the bottom. so it doesnt get added to the same DAG. In Spark, Iceberg Table gets Database added in the name. Im seeing that @Paweł Leszczyński commited Spark/Flink Unify Dataset naming from URI objects (https://github.com/OpenLineage/OpenLineage/pull/2083/files#). So not sure what could be going on

+ + +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 04:53:53
+
+

*Thread Reply:* Looks like this method https://github.com/OpenLineage/OpenLineage/blob/1.11.3/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L164 creates name with (tb+database)

+ +

In general, I would say we should add naming convention here -> https://openlineage.io/docs/spec/naming/ . I think db.table format is fine as we're using it for other sources.

+ +

IcebergSinkVisitor in Flink integration is does not seem to add symlink facet pointing to iceberg table with schema included. You can try extending it with dataset symlink facet as done for Spark.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 06:35:59
+
+

*Thread Reply:* How do you suggest we do so? creating a PR, extending IcebergSink Visitor or do it manually through spark as in this example https://github.com/OpenLineage/workshops/blob/main/spark/dataset_symlinks.ipynb

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 07:26:35
+
+

*Thread Reply:* is there any way to create a symlink via marquez api?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 07:26:44
+
+

*Thread Reply:* trying to figure out whats the easiest approach

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 07:44:54
+
+

*Thread Reply:* there are two possible conventions for pointing to iceberg dataset: +• its physical location +• namespace pointing to iceberg catalog, name pointing to schema+table +Flink integration uses physical location only. IcebergSinkVisitor should add additional facet - dataset symlink facet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 07:46:37
+
+

*Thread Reply:* just like spark integration is doing +here -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L86

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 15:01:10
+
+

*Thread Reply:* I have been testing in modifying first the event that gets emitted, but in the lineage i am seeing duplicate datasets. As the physical location for flink is also different than the one spark uses

diff --git a/channel/github-discussions/index.html b/channel/github-discussions/index.html index 874f225..83e1886 100644 --- a/channel/github-discussions/index.html +++ b/channel/github-discussions/index.html @@ -5930,6 +5930,696 @@

Group Direct Messages

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + +
+ +
Santiago Cobos + (santiago.cobos@ibm.com) +
+
2024-03-25 16:42:24
+
+

@Santiago Cobos has joined the channel

+ + + +
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + +
+ +
Ray Lacerda + (ray.lacerda@live.com) +
+
2024-03-27 21:42:13
+
+

@Ray Lacerda has joined the channel

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ray Lacerda + (ray.lacerda@live.com) +
+
2024-03-27 21:42:20
+
+

@Ray Lacerda has joined the channel

+ + + +
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/channel/github-notifications/index.html b/channel/github-notifications/index.html index 4e9929a..e9331b5 100644 --- a/channel/github-notifications/index.html +++ b/channel/github-notifications/index.html @@ -39756,6 +39756,1104 @@

Group Direct Messages

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/channel/mark-grover/index.html b/channel/mark-grover/index.html index c4f5a14..2fee4fe 100644 --- a/channel/mark-grover/index.html +++ b/channel/mark-grover/index.html @@ -745,6 +745,39 @@

Group Direct Messages

+ +
+
+ +
+ + +
+ +
+ +
+
2021-01-27 14:29:14
+
+ + +
+ + + + + + + +
+ + +
+
+
+
+ + diff --git a/channel/open-lineage-plus-bacalhau/index.html b/channel/open-lineage-plus-bacalhau/index.html index 3abe056..9dd5569 100644 --- a/channel/open-lineage-plus-bacalhau/index.html +++ b/channel/open-lineage-plus-bacalhau/index.html @@ -1113,6 +1113,32 @@

Group Direct Messages

+ + +
+
+ + + + +
+ +
Santiago Cobos + (santiago.cobos@ibm.com) +
+
2024-03-25 16:42:35
+
+

@Santiago Cobos has joined the channel

+ + + +
+
+
+
+ + +
diff --git a/channel/providence-meetup/index.html b/channel/providence-meetup/index.html index 42c6879..d72d07a 100644 --- a/channel/providence-meetup/index.html +++ b/channel/providence-meetup/index.html @@ -534,6 +534,77 @@

Group Direct Messages

+ +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2023-03-10 10:17:34
+
+ + + + + + + +
+ 🙌 Michael Robinson +
+ +
+ 👋 Eric Veleker +
+ +
+
+
+
+ + diff --git a/channel/sf-meetup/index.html b/channel/sf-meetup/index.html index e6bae61..34da086 100644 --- a/channel/sf-meetup/index.html +++ b/channel/sf-meetup/index.html @@ -581,20 +581,28 @@

Group Direct Messages

Some pictures from last night

- + - + + +
- + - + + +
diff --git a/channel/spark-support-multiple-scala-versions/index.html b/channel/spark-support-multiple-scala-versions/index.html index 3329ce6..86c6b44 100644 --- a/channel/spark-support-multiple-scala-versions/index.html +++ b/channel/spark-support-multiple-scala-versions/index.html @@ -3055,12 +3055,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* We use a lot of Seq and I doubt it's the only place we'll have problems

- + - - + @@ -4490,12 +4490,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:*

- + - - + @@ -5233,7 +5233,7 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:*

- + @@ -6480,7 +6480,7 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* (The 2.13.2 migration is because I force Jackson to 2.13.2)

- + @@ -7266,6 +7266,39 @@

5. We have to educate users about this, that they need to carefully select w + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-01-23 10:08:54
+
+ + +
+ + + + + + + +
+ + +
+
+
+
+ + @@ -10145,12 +10178,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* https://github.com/features/packages#pricing

- + - - + @@ -12884,7 +12917,7 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* I have this reference chain

- + @@ -13803,12 +13836,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:*

- + - - + @@ -13946,12 +13979,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* but it's the same on recent main I think. This is main build from 6 days ago https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/9160/workflows/33a4d308-d0e6-4d75-a06b-7d8ef89bb1fe and SparkIcebergIntegrationTest is present there

- + - - + @@ -14091,12 +14124,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:*

- + - - + @@ -20585,12 +20618,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:*

- + - - + @@ -22206,12 +22239,12 @@

5. We have to educate users about this, that they need to carefully select w

*Thread Reply:* Aye, but I didn't want this:

- + - - + @@ -24436,6 +24469,466 @@

5. We have to educate users about this, that they need to carefully select w + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 05:23:22
+
+

Hey team!

+ +

Another discussion: I created an MSK transport; let me know what you think. +With this transport, OL users can use MSK with IAM authentication without defining a custom transport.

+ +

https://github.com/OpenLineage/OpenLineage/pull/2478

+
+ + + + + + + +
+
Labels
+ client/python +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:28:30
+
+

*Thread Reply:* would be great if you could confirm that you tested this manually

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 05:30:55
+
+

*Thread Reply:* I test it. I can show some screenshots 🙂 +I have to create a small Python script, ship everything in a docker container and run it in a machine with network connectivity to MSK 😅

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:32:40
+
+

*Thread Reply:* I believe you, it's just it would be too expensive time wise to have real integration tests for each of those transports, so we have to rely on people manually testing it 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 05:39:51
+
+

*Thread Reply:* Yeah you need an AWS account, some terraform code to create and destroy the MSK plus the integration test to run inside the VPC network 😅

+ +

But It's makes sense to put some screenshot in the PR just to show that was tested and how.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 05:40:18
+
+

*Thread Reply:* The only test is the IAM auth because other than that is normal Kafka

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 07:20:41
+
+

*Thread Reply:* test code +```import datetime +import uuid

+ +

from openlineage.client import OpenLineageClient +from openlineage.client.run import Job, Run, RunEvent, RunState +from openlineage.client.transport import MSKIAMTransport +from openlineage.client.transport.msk_iam import MSKIAMConfig

+ +

if name == "main": + import logging

+ +
logging.basicConfig(level=logging.DEBUG)
+config = MSKIAMConfig(
+    config={
+        "bootstrap.servers": "b-2.xxx.c2.kafka.eu-west-2.amazonaws.com:9098,b_1.xxx.c2.kafka.eu_west_2.amazonaws.com:9098"
+    },
+    topic="my_test_topic",
+    region="eu-west-2",
+    flush=True,
+)
+transport = MSKIAMTransport(config)
+client = OpenLineageClient(transport=transport)
+event = RunEvent(
+    eventType=RunState.START,
+    eventTime=datetime.datetime.now().isoformat(),
+    run=Run(runId=str(uuid.uuid4())),
+    job=Job(namespace="kafka", name="test"),
+    producer="prod",
+    schemaURL="schema/RunEvent",
+)
+
+client.emit(event)
+client.transport.producer.flush(timeout=1)
+print("Messages sent")```
+
+ +

Logs +DEBUG:openlineage.client.transport.kafka:BRKMAIN [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: Enter main broker thread +2024-02-29T12:14:47.560285672Z DEBUG:openlineage.client.transport.kafka:CONNECT [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: Received CONNECT op +2024-02-29T12:14:47.560288447Z DEBUG:openlineage.client.transport.kafka:STATE [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: Broker changed state INIT -> TRY_CONNECT +2024-02-29T12:14:47.560291862Z DEBUG:openlineage.client.transport.kafka:BROADCAST [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: Broadcasting state change +2024-02-29T12:14:47.560294645Z DEBUG:openlineage.client.transport.kafka:TOPIC [rdkafka#producer-1] [thrd:app]: New local topic: my_test_topic +2024-02-29T12:14:47.560297342Z DEBUG:openlineage.client.transport.kafka:TOPPARNEW [rdkafka#producer-1] [thrd:app]: NEW my_test_topic [-1] 0x5598e047bbf0 refcnt 0x5598e047bc80 (at rd_kafka_topic_new0:472) +2024_02_29T12:14:47.560300475Z DEBUG:openlineage.client.transport.kafka:BRKMAIN [rdkafka#producer-1] [thrd:app]: Waking up waiting broker threads after setting OAUTHBEARER token +2024-02-29T12:14:47.560303259Z DEBUG:openlineage.client.transport.kafka:WAKEUP [rdkafka#producer-1] [thrd:app]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: Wake-up: OAUTHBEARER token update +2024-02-29T12:14:47.560306334Z DEBUG:openlineage.client.transport.kafka:WAKEUP [rdkafka#producer-1] [thrd:app]: Wake-up sent to 1 broker thread in state >= TRY_CONNECT: OAUTHBEARER token update +2024-02-29T12:14:47.560309239Z DEBUG:openlineage.client.transport.kafka:CONNECT [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: broker in state TRY_CONNECT connecting +2024-02-29T12:14:47.560312101Z DEBUG:openlineage.client.transport.kafka:STATE [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/bootstrap>: Broker changed state TRY_CONNECT -> CONNECT +... +DEBUG:openlineage.client.transport.kafka:PRODUCE [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/1>: my_test_topic [0]: Produce MessageSet with 1 message(s) (349 bytes, ApiVersion 7, MsgVersion 2, MsgId 0, BaseSeq -1, PID{Invalid}, uncompressed) +2024-02-29T12:14:48.326364842Z DEBUG:openlineage.client.transport.kafka:SEND [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/1>: Sent ProduceRequest (v7, 454 bytes @ 0, CorrId 5) +2024-02-29T12:14:48.382471756Z DEBUG:openlineage.client.transport.kafka:RECV [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/1>: Received ProduceResponse (v7, 102 bytes, CorrId 5, rtt 55.99ms) +2024-02-29T12:14:48.382517219Z DEBUG:openlineage.client.transport.kafka:MSGSET [rdkafka#producer-1] [thrd:sasl_ssl://b-1.xxx.c2.kafka.eu-west-2]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/1>: my_test_topic [0]: MessageSet with 1 message(s) (MsgId 0, BaseSeq -1) delivered +2024-02-29T12:14:48.382623532Z DEBUG:openlineage.client.transport.kafka:Send message <cimpl.Message object at 0x7fb116fcde40> +2024-02-29T12:14:48.382648622Z DEBUG:openlineage.client.transport.kafka:Amount of messages left in Kafka buffers after flush 0 +2024-02-29T12:14:48.382730647Z DEBUG:openlineage.client.transport.kafka:WAKEUP [rdkafka#producer-1] [thrd:app]: sasl_<ssl://b-1.xxx.c2.kafka.eu-west-2.amazonaws.com:9098/1>: Wake-up: flushing +2024-02-29T12:14:48.382747018Z DEBUG:openlineage.client.transport.kafka:WAKEUP [rdkafka#producer-1] [thrd:app]: Wake-up sent to 1 broker thread in state >= UP: flushing +2024-02-29T12:14:48.382752798Z Messages sent

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 08:39:50
+
+

*Thread Reply:* I copied from the Kafka transport +https://github.com/OpenLineage/OpenLineage/pull/2478#discussion_r1507361123 +and It makes sense because otherwise when python read all the file could import a library that doesn't exist in case you don't need it.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-02-29 08:42:43
+
+

*Thread Reply:* Also It think it's better to drop the support for IMDSv1 and in any case I should implement the IMDSv2 😅 to be complete +https://github.com/OpenLineage/OpenLineage/pull/2478#discussion_r1507359486

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mattia Bertorello + (mattia.bertorello@booking.com) +
+
2024-03-05 04:10:45
+
+

*Thread Reply:* Hi @Kacper Muda, +Is there still something to do in this PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rajat + (rajat.movaliya@atlan.com) +
+
2024-03-18 09:52:32
+
+

@Rajat has joined the channel

+ + + +
+
+
+
+ + +

diff --git a/index.html b/index.html index bcc1320..0f79395 100644 --- a/index.html +++ b/index.html @@ -7307,11 +7307,15 @@

Group Direct Messages

Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push?

@@ -8043,7 +8047,7 @@

Group Direct Messages

*Thread Reply:*

- + @@ -8078,7 +8082,7 @@

Group Direct Messages

*Thread Reply:*

- + @@ -8958,11 +8962,15 @@

Supress success

*Thread Reply:*

@@ -9685,7 +9693,7 @@

Supress success

*Thread Reply:*

- + @@ -11863,11 +11871,15 @@

Supress success

Build on main passed (edited)

@@ -12784,6 +12796,43 @@

Supress success

+ +
+
+ + + + +
+ +
Luke Smith + (luke.smith@kinandcarta.com) +
+
2021-08-20 13:57:41
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -12856,11 +12905,15 @@

Supress success

I added this configuration to my cluster :

@@ -12891,11 +12944,15 @@

Supress success

I receive this error message:

@@ -13097,11 +13154,15 @@

Supress success

*Thread Reply:*

@@ -13251,11 +13312,15 @@

Supress success

Now I have this:

@@ -13416,11 +13481,15 @@

Supress success

*Thread Reply:* Hi , @Luke Smith, thank you for your help, are you familiar with this error in azure databricks when you use OL?

@@ -13451,11 +13520,15 @@

Supress success

*Thread Reply:*

@@ -13508,11 +13581,15 @@

Supress success

@@ -17922,11 +17999,15 @@

Supress success

*Thread Reply:* Successfully got a basic prefect flow working

@@ -22372,29 +22453,41 @@

Supress success

I also see exceptions in Marquez logs

@@ -22847,11 +22940,15 @@

Supress success

Hey there, I’m not sure why I’m getting below error, after I ran OPENLINEAGE_URL=<http://localhost:5000> dbt-ol run , although running this command dbt debug doesn’t show any error. Pls help.

@@ -23166,20 +23263,28 @@

Supress success

*Thread Reply:* Actually i had to use venv that fixed above issue. However, i ran into another problem which is no jobs / datasets found in marquez:

@@ -23422,11 +23527,15 @@

Supress success

*Thread Reply:*

@@ -24252,20 +24361,28 @@

Supress success

@@ -24322,11 +24439,15 @@

Supress success

*Thread Reply:* oh got it, since its in default, i need to click on it and choose my dbt profile’s account name. thnx

@@ -24357,11 +24478,15 @@

Supress success

*Thread Reply:* May I know, why these highlighted ones dont have schema? FYI, I used sources in dbt.

@@ -24418,11 +24543,15 @@

Supress success

*Thread Reply:* I prepared this yaml file, not sure this is what u asked

@@ -27866,11 +27995,15 @@

Supress success

I have a dag that contains 2 tasks:

@@ -28832,11 +28965,15 @@

Supress success

@@ -28867,11 +29004,15 @@

Supress success

It created 3 namespaces. One was the one that I point in the spark config property. The other 2 are the bucket that we are writing to () and the bucket where we are reading from ()

@@ -28928,11 +29069,15 @@

Supress success

I can see if i enter in one of the weird jobs generated this:

@@ -28963,11 +29108,15 @@

Supress success

*Thread Reply:* This job with no output is a symptom of the output not being understood. you should be able to see the facets for that job. There will be a spark_unknown facet with more information about the problem. If you put that into an issue with some more details about this job we should be able to help.

@@ -29026,11 +29175,15 @@

Supress success

If I check the logs of marquez-web and marquez I can't see any error there

@@ -29061,11 +29214,15 @@

Supress success

When I try to open the job fulfilments.execute_insert_into_hadoop_fs_relation_command I see this window:

@@ -30882,11 +31039,15 @@

Supress success

I cannot see a graph of my job now. Is this something to do with the namespace names?

@@ -30995,11 +31156,15 @@

Supress success

*Thread Reply:* Here's what I mean:

- + - + + +
@@ -31226,7 +31391,7 @@

Supress success

*Thread Reply:* This is an example Lineage event JSON I am sending.

- + @@ -35361,29 +35526,41 @@

Supress success

Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]

@@ -35554,56 +35731,80 @@

Supress success

*Thread Reply:* There are two types of failures: tests failed on stage model (relationships) and physical error in master model (no table with such name). The stage test node in Marquez does not show any indication of failures and dataset node indicates failure but without number of failed records or table name for persistent test storage. The failed master model shows in red but no details of failure. Master model tests were skipped because of model failure but UI reports "Complete".

@@ -35638,20 +35839,28 @@

Supress success

And for dbt test failures, to visualize better that error is happening, for example like that:

@@ -35823,11 +36032,15 @@

Supress success

hello everyone , i'm learning Openlineage, I am trying to connect with airflow 2, is it possible? or that version is not yet released. this is currently throwing me airflow

@@ -36077,6 +36290,43 @@

Supress success

+ +
+
+ + + + +
+ +
David Virgil + (david.virgil.naranjo@googlemail.com) +
+
2022-01-11 12:23:41
+
+ + + + + +
+
+
+
+ + @@ -36360,11 +36610,15 @@

Supress success

@@ -36834,11 +37088,15 @@

Supress success

*Thread Reply:* It needs to show Docker Desktop is running :

@@ -37154,20 +37412,28 @@

Supress success

@@ -39803,7 +40069,7 @@

Supress success

I've attached the logs and a screenshot of what I'm seeing the Spark UI. If you had a chance to take a look, it's a bit verbose but I'd appreciate a second pair of eyes on my analysis. Hopefully I got something wrong 😅

- + @@ -39812,11 +40078,15 @@

Supress success

@@ -39983,11 +40253,15 @@

Supress success

@@ -40596,7 +40870,7 @@

Supress success

*Thread Reply:* This is the one I wrote:

- + @@ -41169,11 +41443,15 @@

Supress success

*Thread Reply:* however I can not fetch initial data when login into the endpoint

@@ -41681,11 +41959,15 @@

Supress success

https://files.slack.com/files-pri/T01CWUYP5AR-F036JKN77EW/image.png

@@ -43154,11 +43436,15 @@

Supress success

@Kevin Mellott Hello Kevin, sorry to bother you again. I was finally able to configure Marquez in AWS using an ALB. Now I am receiving this error when calling the API

@@ -44042,11 +44328,15 @@

Supress success

Am i supposed to see this when I open marquez fro the first time on an empty database?

@@ -44433,11 +44723,15 @@

Supress success

Do I follow these steps?

@@ -44549,11 +44843,15 @@

Supress success

Do i use OpenLineageURL or Marquez_URL?

@@ -44883,11 +45181,15 @@

logger = logging.getLogger(name)

@@ -48303,11 +48605,15 @@

logger = logging.getLogger(name)

Hi Everyone, Can someone please help me to debug this error ? Thank you very much all

@@ -49555,11 +49861,15 @@

logger = logging.getLogger(name)

Hello everyone, I'm learning Openlineage, I finally achieved the connection between Airflow 2+ and Openlineage+Marquez. The issue is that I don't see nothing on Marquez. Do I need to modify current airflow operators?

@@ -49642,11 +49952,15 @@

logger = logging.getLogger(name)

value: data-dev```

@@ -49704,11 +50018,15 @@

logger = logging.getLogger(name)

*Thread Reply:* Thanks, finally was my error .. I created a dummy dag to see if maybe it's an issue over the dag and now I can see something over Marquez

@@ -49824,7 +50142,7 @@

logger = logging.getLogger(name)

Any thoughts?

- + @@ -49833,7 +50151,7 @@

logger = logging.getLogger(name)

- + @@ -50911,11 +51229,15 @@

logger = logging.getLogger(name)

happy to share the slides with you if you want 👍 here’s a PDF:

@@ -51028,11 +51350,15 @@

logger = logging.getLogger(name)

Your periodical reminder that Github stars are one of those trivial things that make a significant difference for an OS project like ours. Have you starred us yet?

@@ -53756,11 +54082,15 @@

logger = logging.getLogger(name)

The picture is my custom extractor, it's not doing anything currently as this is just a test.

@@ -53843,11 +54173,15 @@

logger = logging.getLogger(name)

*Thread Reply:*

@@ -53959,11 +54293,15 @@

logger = logging.getLogger(name)

This is a similar setup as Michael had in the video.

@@ -54438,11 +54776,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

I was testing https://github.com/MarquezProject/marquez/tree/main/examples/airflow#step-21-create-dag-counter, and the following error was observed in my airflow env:

@@ -55966,38 +56308,54 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Please reach me out if you have any questions!

@@ -56482,20 +56840,28 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hi~all, I have a question about lineage. I am now running airflow 2.3.1 and have started a latest marquez service by docker-compose. I found that using the example DAG of airflow can only see the job information, but not the lineage of the job. How can I configure it to see the lineage ?

@@ -57725,20 +58091,28 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hello all, after sending dbt openlineage events to Marquez, I am now looking to use the Marquez API to extract the lineage information. I am able to use python requests to call the Marquez API to get other information such as namespaces, datasets, etc., but I am a little bit confused about what I need to enter to get the lineage. I included screenshots for what the API reference shows regarding retrieving the lineage where it shows that a nodeId is required. However, this is where I seem to be having problems. It is not exactly clear where the nodeId needs to be set or what the nodeId needs to include. I would really appreciate any insights. Thank you!

@@ -57797,11 +58171,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

*Thread Reply:* You can do this in a few ways (that I can think of). First, by looking for a namespace, then querying for the datasets in that namespace:

@@ -57832,11 +58210,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

*Thread Reply:* Or you can search, if you know the name of the dataset:

@@ -60640,6 +61022,43 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

+ +
+
+ + + + +
+ +
Conor Beverland + (conorbev@gmail.com) +
+
2022-06-28 20:05:54
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -60668,6 +61087,43 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

+ +
+
+ + + + +
+ +
Conor Beverland + (conorbev@gmail.com) +
+
2022-06-28 20:07:27
+
+ + +
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -63015,11 +63471,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

check this out folks - marklogic datahub flow lineage into OL/marquez with jobs and runs and more. i would guess this is a pretty narrow use case but it went together really smoothly and thought i'd share sometimes it's just cool to see what people are working on

@@ -64118,11 +64578,15 @@

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

Hi all, I have been playing around with Marquez for a hackday. I have been able to get some lineage information loaded in (using the local docker version for now). I have been trying set the location (for the link) and description information for a job (the text saying "Nothing to show here") but I haven't been able to figure out how to do this using the /lineage api. Any help would be appreciated.

@@ -65110,11 +65574,15 @@

SundayFunday

Putting together some internal training for OpenLineage and highlighting some of the areas that have been useful to me on my journey with OpenLineage. Many thanks to @Michael Collado, @Maciej Obuchowski, and @Paweł Leszczyński for the continued technical support and guidance.

@@ -65257,20 +65725,28 @@

SundayFunday

hi all, really appreciate if anyone could help. I have been trying to create a poc project with openlineage with dbt. attached will be the pip list of the openlineage packages that i have. However, when i run "dbt-ol"command, it prompted as öpen as file, instead of running as a command. the regular dbt run can be executed without issue. i would want i had done wrong or if any configuration that i have missed. Thanks a lot

@@ -65649,7 +66125,7 @@

SundayFunday

./gradlew :shared:spotlessApply &amp;&amp; ./gradlew :app:spotlessApply &amp;&amp; ./gradlew clean build test

- + @@ -66401,11 +66877,15 @@

SundayFunday

maybe another question for @Paweł Leszczyński: I was watching the Airflow summit talk that you and @Maciej Obuchowski did ( very nice! ). How is this exposed? I'm wondering if it shows up as an edge on the graph in Marquez? ( I guess it may be tracked as a parent run and if so probably does not show on the graph directly at this time? )

@@ -66869,11 +67349,15 @@

SundayFunday

*Thread Reply:*

@@ -68877,11 +69361,15 @@

SundayFunday

*Thread Reply:* After I send COMPLETE event with the same information I can see the dataset.

@@ -68945,11 +69433,15 @@

SundayFunday

In this example I've added my-test-input on START and my-test-input2 on COMPLETE :

@@ -71716,11 +72208,15 @@

SundayFunday

Here is the Marquez UI

- + - + + +
@@ -72430,11 +72926,15 @@

SundayFunday

*Thread Reply:*

@@ -77177,11 +77677,15 @@

SundayFunday

*Thread Reply:* Apparently the value is hard coded in the code somewhere that I couldn't figure out but at-least learnt that in my Mac where this port 5000 is being held up can be freed by following the below simple step.

@@ -84818,11 +85322,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

But if I am not in a virtual environment, it installs the packages in my PYTHONPATH. You might try this to see if the dbt-ol script can be found in one of the directories in sys.path.

@@ -84853,11 +85361,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* this can help you verify that your PYTHONPATH and PATH are correct - installing an unrelated python command-line tool and seeing if you can execute it:

@@ -89933,11 +90445,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -93252,11 +93768,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi Team, I’m seeing creating data source, dataset API’s marked as deprecated . Can anyone point me how to create datasets via API calls?

@@ -94211,11 +94731,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Is it possible to add column level lineage via api? Let's say I have fields A,B,C from my-input, and A,B from my-output, and B,C from my-output-s3. I want to see, filter, or query by the column name.

@@ -97313,11 +97837,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

23/04/20 10:00:15 INFO ConsoleTransport: {"eventType":"START","eventTime":"2023-04-20T10:00:15.085Z","run":{"runId":"ef4f46d1-d13a-420a-87c3-19fbf6ffa231","facets":{"spark.logicalPlan":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.22.0/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect","num-children":2,"name":0,"partitioning":[],"query":1,"tableSpec":null,"writeOptions":null,"ignoreIfExists":false},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedTableName","num-children":0,"catalog":null,"ident":null},{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"workorderid","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-cl

@@ -99066,11 +99594,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi, I'm new to Open data lineage and I'm trying to connect snowflake database with marquez using airflow and getting the error in etl_openlineage while running the airflow dag on local ubuntu environment and unable to see the marquez UI once it etl_openlineage has ran completed as success.

@@ -99101,11 +99633,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* What's the extract_openlineage.py file? Looks like your code?

@@ -99670,11 +100206,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* This is my log in airflow, can you please prvide more info over it.

@@ -99735,20 +100275,28 @@

MARQUEZAPIKEY=[YOURAPIKEY]

App listening on port 3000!

@@ -99827,20 +100375,28 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -101255,11 +101811,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Be on the lookout for an announcement about the next meetup!

@@ -101795,11 +102355,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I have configured Open lineage with databricks and it is sending events to Marquez as expected. I have a notebook which joins 3 tables and write the result data frame to an azure adls location. Each time I run the notebook manually, it creates two start events and two complete events for one run as shown in the screenshot. Is this something expected or I am missing something?

@@ -102859,11 +103423,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I have a usecase where we are connecting to Azure sql database from databricks to extract, transform and load data to delta tables. I could see the lineage is getting build, but there is no column level lineage through its 1:1 mapping from source. Could you please check and update on this.

@@ -102977,7 +103545,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Here is the code we use.

- + @@ -104093,7 +104661,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@Paweł Leszczyński @Michael Robinson

- + @@ -108410,11 +108978,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I can see my job there but when i click on the job when its supposed to show lineage, its just an empty screen

@@ -108535,11 +109107,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* ohh but if i try using the console output, it throws ClientProtocolError

@@ -108596,11 +109172,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* this is the dev console in browser

@@ -108831,11 +109411,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* marquez didnt get updated

@@ -109339,6 +109923,43 @@

MARQUEZAPIKEY=[YOURAPIKEY]

+ +
+
+ + + + +
+ +
Rachana Gandhi + (rachana.gandhi410@gmail.com) +
+
2023-06-08 11:11:46
+ +
+
+
+ + @@ -110042,11 +110663,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@@ -110077,11 +110702,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @Michael Robinson When we follow the documentation without changing anything and run sudo ./docker/up.sh we are seeing following errors:

@@ -110112,11 +110741,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* So, I edited up.sh file and modified docker compose command by removing --log-level flag and ran sudo ./docker/up.sh and found following errors:

@@ -110147,11 +110780,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Then I copied .env.example to .env since compose needs .env file

@@ -110182,11 +110819,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* I got this error:

@@ -110273,11 +110914,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @Michael Robinson Then it kind of worked but seeing following errors:

@@ -110308,11 +110953,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -110656,11 +111305,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

@@ -111536,7 +112189,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* This is the event generated for above query.

- + @@ -111607,7 +112260,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

this is event for view for which no lineage is being generated

- + @@ -112022,11 +112675,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

It was great meeting/catching up with everyone. Hope to see you and more new faces at the next one!

@@ -112830,11 +113487,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

@@ -116216,11 +116877,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

Hi, I am running a job in Marquez with 180 rows of metadata but it is running for more than an hour. Is there a way to check the log on Marquez? Below is the screenshot of the job:

@@ -116278,11 +116943,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* Also, yes, we have an even viewer that allows you to query the raw OL events

@@ -116339,7 +117008,7 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:*

- + @@ -117118,11 +117787,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

but the page is empty

@@ -117452,11 +118125,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

I can now see this

@@ -117487,11 +118164,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* but when i click on the job i then get this

@@ -117548,11 +118229,15 @@

MARQUEZAPIKEY=[YOURAPIKEY]

*Thread Reply:* @George Polychronopoulos Hi, I am facing the same issue. After adding spark conf and using the docker run command, marquez is still showing empty. Do I need to change something in the run command?

@@ -119539,11 +120224,15 @@

Marquez as an OpenLineage Client

@@ -119976,20 +120665,28 @@

Marquez as an OpenLineage Client

@@ -121039,7 +121736,7 @@

Marquez as an OpenLineage Client

Expected. vs Actual.

- + @@ -121048,7 +121745,7 @@

Marquez as an OpenLineage Client

- + @@ -121066,6 +121763,56 @@

Marquez as an OpenLineage Client

+ +
+
+ + + + +
+ +
GitHubOpenLineageIssues + (githubopenlineageissues@gmail.com) +
+
2023-08-07 11:21:04
+
+ + +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + @@ -124136,20 +124883,28 @@

Marquez as an OpenLineage Client

The OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?

@@ -124345,20 +125100,28 @@

csv_file = location.csv

Part of the logs with the OL configurations and the processed event

@@ -124462,11 +125225,15 @@

csv_file = location.csv

@@ -125033,11 +125800,15 @@

csv_file = location.csv

*Thread Reply:* I assume the problem is somewhere there, not on the level of facet definition, since SchemaDatasetFacet looks pretty much the same and it works

@@ -125157,11 +125928,15 @@

csv_file = location.csv

*Thread Reply:*

@@ -125192,11 +125967,15 @@

csv_file = location.csv

*Thread Reply:* I think the code here filters out those string values in the list

@@ -125426,11 +126205,15 @@

csv_file = location.csv

@@ -126480,12 +127263,12 @@

csv_file = location.csv

let me update the branch and test again

- + - - + @@ -126744,6 +127527,130 @@

csv_file = location.csv

+
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-02-28 06:02:00
+
+

*Thread Reply:* @Paweł Leszczyński @Maciej Obuchowski +can you please approve this CI to run integration tests? +https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/9497/workflows/4a20dc95-d5d1-4ad7-967c-edb6e2538820

+ + + +
+ 👍 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-02-29 01:13:11
+
+

*Thread Reply:* @Paweł Leszczyński +only 2 spark version are sending empty +input and output +for both START and COMPLETE event

+ +
+

• 3.4.2 + • 3.5.0 + i can look into the above , if you guide me a bit on how to ? + should i open a new ticket for it? + please suggest how to proceed?

+
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-03-01 04:01:45
+
+

*Thread Reply:* this integration test case lead to finding of the above bug for spark 3.4.2 and 3.5.0 +will that be a blocker to merge this test case? +@Paweł Leszczyński @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
savan + (SavanSharan_Navalgi@intuit.com) +
+
2024-03-06 09:01:44
+
+

*Thread Reply:* @Paweł Leszczyński @Maciej Obuchowski +any direction on the above blocker will be helpful.

+ + + +
+
+
+
+ + + + +
@@ -127691,11 +128598,15 @@

csv_file = location.csv

I was doing this a second ago and this ended up with Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@1609ed55

@@ -128866,11 +129777,15 @@

csv_file = location.csv

*Thread Reply:* Can you please share with me your json conf for the cluster ?

@@ -128901,11 +129816,15 @@

csv_file = location.csv

*Thread Reply:* It's because in mu build file I have

@@ -128936,11 +129855,15 @@

csv_file = location.csv

*Thread Reply:* and the one that was copied is

@@ -132181,20 +133104,28 @@

csv_file = location.csv

Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at https://openlineage.io/getting-started/. However, I've encountered a problem while attempting to complete *Step 1* of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.

@@ -132251,11 +133182,15 @@

csv_file = location.csv

*Thread Reply:* @Jakub Dardziński 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.

@@ -134980,11 +135915,15 @@

set the log level for the openlineage spark library

*Thread Reply:* This is the error message:

@@ -135041,11 +135980,15 @@

set the log level for the openlineage spark library

I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGEEXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGEEXTRACTORS is dependencies.<filename>.<extractorclass_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake

@@ -135365,11 +136308,15 @@

set the log level for the openlineage spark library

*Thread Reply:*

@@ -135427,11 +136374,15 @@

set the log level for the openlineage spark library

*Thread Reply:*

@@ -135488,11 +136439,15 @@

set the log level for the openlineage spark library

*Thread Reply:* https://openlineage.slack.com/files/U05QL7LN2GH/F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png

@@ -135679,7 +136634,7 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the worker pod logs…where there is no log of openlineageplugin

- + @@ -135821,11 +136776,15 @@

set the log level for the openlineage spark library

*Thread Reply:* this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.bigqueryinsertjobextractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder

@@ -135856,11 +136815,15 @@

set the log level for the openlineage spark library

*Thread Reply:* https://openlineage.slack.com/files/U05QL7LN2GH/F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png

@@ -135891,11 +136854,15 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the logs of the triggerer pod specifically

@@ -135978,11 +136945,15 @@

set the log level for the openlineage spark library

*Thread Reply:* these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for

@@ -136272,11 +137243,15 @@

set the log level for the openlineage spark library

*Thread Reply:* have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGEEXTRACTORS to bigqueryinsertjob_extractor.BigQueryInsertJobExtractor…still the same thing

@@ -136502,11 +137477,15 @@

set the log level for the openlineage spark library

*Thread Reply:* I’ve done experiment, that’s how gcs looks like

@@ -136537,11 +137516,15 @@

set the log level for the openlineage spark library

*Thread Reply:* and env vars

@@ -137171,7 +138154,7 @@

set the log level for the openlineage spark library

- + @@ -137206,7 +138189,7 @@

set the log level for the openlineage spark library

*Thread Reply:*

- + @@ -139336,7 +140319,7 @@

set the log level for the openlineage spark library

I am attaching the log4j, there is no openlineagecontext

- + @@ -140331,47 +141314,67 @@

set the log level for the openlineage spark library

@@ -140422,29 +141425,41 @@

set the log level for the openlineage spark library

*Thread Reply:* A few more pics:

@@ -143258,16 +144273,20 @@

set the log level for the openlineage spark library

@here I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this

- + @@ -144823,11 +145842,15 @@

set the log level for the openlineage spark library

*Thread Reply:* @Paweł Leszczyński this is what I am getting

@@ -144858,7 +145881,7 @@

set the log level for the openlineage spark library

*Thread Reply:* attaching the html

- + @@ -145500,11 +146523,15 @@

set the log level for the openlineage spark library

*Thread Reply:* @Paweł Leszczyński you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the "environment-properties" attribute. Some condition is not being met here, or maybe it is what @Jason Yip suspects and there's some sort of filtering of unnecessary events

@@ -146215,11 +147242,15 @@

set the log level for the openlineage spark library

*Thread Reply:* In docker, marquez-api image is not running and exiting with the exit code 127.

@@ -146765,11 +147796,15 @@

set the log level for the openlineage spark library

Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated

@@ -147274,11 +148309,15 @@

set the log level for the openlineage spark library

*Thread Reply:* I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?

@@ -148593,7 +149632,7 @@

set the log level for the openlineage spark library

*Thread Reply:*

- + @@ -150141,7 +151180,7 @@

show data

@Paweł Leszczyński I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?

- + @@ -152003,7 +153042,7 @@

show data

@Paweł Leszczyński I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....

- + @@ -153486,11 +154525,15 @@

show data

like ( file_name, size, modification time, creation time )

@@ -154451,11 +155494,15 @@

show data

execute_spark_script(1, "/home/haneefa/airflow/dags/saved_files/")

@@ -155287,12 +156334,12 @@

Set up SparkSubmitOperator for each query

I was referring to fluentd openlineage proxy which lets users copy the event and send it to multiple backend. Fluentd has a list of out-of-the box output plugins containing BigQuery, S3, Redshift and others (https://www.fluentd.org/dataoutputs)

- + - - + @@ -157316,7 +158363,7 @@

Set up SparkSubmitOperator for each query

*Thread Reply:* This text file contains a total of 10-11 events, including the start and completion events of one of my notebook runs. The process is simply reading from a Hive location and performing a full load to another Hive location.

- + @@ -160042,12 +161089,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

Thanks 🙏

- + - - + @@ -161188,12 +162235,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:* in Admin > Plugins can you see whether you have OpenLineageProviderPlugin and if so, are there listeners?

- + - - + @@ -161292,7 +162339,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:* Dont

- + @@ -161353,7 +162400,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:*

- + @@ -162629,6 +163676,39 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

+ +
+
+ + + + +
+ +
Zacay Daushin + (zacayd@octopai.com) +
+
2023-12-20 07:25:53
+
+ + +
+ + + + + + + +
+ + +
+
+
+
+ + @@ -163587,12 +164667,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

I've created a pdf with some code samples and OL inputs and output attributes.

- + - - + @@ -163600,7 +164680,7 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

- + @@ -165675,12 +166755,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

Do we have the functionality to search on the lineage we are getting?

- + - - + @@ -166783,12 +167863,12 @@

./bin/spark-submit --class "SparkTest" --master local[**] --jars ```

*Thread Reply:*

- + - - + @@ -166993,7 +168073,7 @@

Gradle 8.5

- + @@ -167599,12 +168679,12 @@

Gradle 8.5

any suggestions on naming for Graph API sources from outlook? I pull a lot of data from email attachments with Airflow. generally I am passing a resource (email address), the mailbox, and subfolder. from there I list messages and find attachments

- + - - + @@ -168448,12 +169528,12 @@

Gradle 8.5

Hello team I see the following issue when i install apache-airflow-providers-openlineage==1.4.0

- + - - + @@ -168674,12 +169754,12 @@

Gradle 8.5

is there any solution?

- + - - + @@ -168984,12 +170064,12 @@

Gradle 8.5

- + - - + @@ -169177,6 +170257,32 @@

Gradle 8.5

+
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:02:34
+
+

*Thread Reply:* @jayant joshi did deleting all volumes work for you, or did you discover another solution? We see users encountering this error from time to time, and it would be helpful to know more.

+ + + +
+
+
+
+ + + + +
@@ -169285,7 +170391,7 @@

Gradle 8.5

- ❤️ Ross Turk, Harel Shein, tati, Rodrigo Maia, Maciej Obuchowski, Jarek Potiuk, Mattia Bertorello + ❤️ Ross Turk, Harel Shein, tati, Rodrigo Maia, Maciej Obuchowski, Jarek Potiuk, Mattia Bertorello, Sheeri Cabral (Collibra)
@@ -169355,12 +170461,12 @@

Gradle 8.5

"spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" --packages "io.openlineage:openlineagespark:1.7.0" --conf "spark.openlineage.transport.type=http" --conf "spark.openlineage.transport.url= http://marquez-api:5000" --conf "spark.openlineage.namespace=sparkintegration" pyspark_etl.py".

- + - - + @@ -169715,12 +170821,12 @@

Gradle 8.5

*Thread Reply:* Find the attached localhost 5000 & 5001 port results. Note that while running same code in the jupyter notebook, I could see lineage on the Marquez UI. For running a code through spark-submit only I am facing an issue.

- + - - + @@ -169728,12 +170834,12 @@

Gradle 8.5

- + - - + @@ -170234,12 +171340,12 @@

Gradle 8.5

*Thread Reply:* From your code, I could see marquez-api is running successfully at "http://marquez-api:5000". Find attached screenshot.

- + - - + @@ -170485,12 +171591,12 @@

Gradle 8.5

*Thread Reply:* the quickstart guide shows this example and it produces the result with a output node in the results, But when I run this in databricks I see no output node generated.

- + - - + @@ -170498,12 +171604,12 @@

Gradle 8.5

- + - - + @@ -170579,12 +171685,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* as a result onkar_table as a dataset was never recorded hence lineage between mayur_table and onkar_table was not recorded as well

- + - - + @@ -170592,12 +171698,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -170981,12 +172087,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -171473,12 +172579,12 @@

Write the data from the source DataFrame to the destination table

Error Screenshot:

- + - - + @@ -171543,12 +172649,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -171610,12 +172716,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* While composing up an open lineage docker-compose.yml. It showed the path to access jupyter lab, through the path I am accessing it. I didn't run any command externally. Find the attached screenshot.

- + - - + @@ -171706,12 +172812,12 @@

Write the data from the source DataFrame to the destination table

I just tried to inspect the notebook container, there I could "GRANT_SUDO=yes". And after passing this also it's asking the password. Find the attached screenshot. Thanks.

- + - - + @@ -172212,12 +173318,12 @@

Write the data from the source DataFrame to the destination table

listeners should be there under OpenLineageProviderPlugin

- + - - + @@ -172408,12 +173514,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* This is the snapshot of my Plugins. I will also try with the configs which you mentioned.

- + - - + @@ -173078,12 +174184,12 @@

Write the data from the source DataFrame to the destination table

Thanks.

- + - - + @@ -173091,12 +174197,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -173104,12 +174210,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -173417,12 +174523,12 @@

Write the data from the source DataFrame to the destination table

Do you have any idea how to fix this?

- + - - + @@ -173484,12 +174590,12 @@

Write the data from the source DataFrame to the destination table

DETAIL: Role "marquez" does not exist.

- + - - + @@ -173549,12 +174655,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* Probably you might ask this.

- + - - + @@ -173615,12 +174721,12 @@

Write the data from the source DataFrame to the destination table

With this, the above error was gone. But it has an authentication error as below.

- + - - + @@ -174651,12 +175757,12 @@

Write the data from the source DataFrame to the destination table

We have gone through the OpenLineage documentation, from the documentation we could only get supported spark versions and data source types alone. Thanks.

- + - - + @@ -174912,12 +176018,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -174951,12 +176057,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -174990,12 +176096,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:*

- + - - + @@ -175057,12 +176163,12 @@

Write the data from the source DataFrame to the destination table

I did an airflow backfill job which redownloaded all files from a SFTP (191 files) and each of those are a separate OL dataset. in this view I clicked on a single file, but because it is connected to the "extract" airflow task, it shows all of the files that task downloaded as well (dynamic mapped tasks in Airflow)

- + - - + @@ -176608,6 +177714,94 @@

Write the data from the source DataFrame to the destination table

+
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 12:43:16
+
+

*Thread Reply:* @Matthew Paras Hi! +im still struggling with empty outputs on databricks with OL latest version.

+ +

24/03/13 16:35:56 INFO PlanUtils: apply method failed with +org.apache.spark.SparkException: There is no Credential Scope. Current env: Driver

+ +

Any idea on how to solve this?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 12:53:44
+
+

*Thread Reply:* Any databricks runtime version i should test with?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Matthew Paras + (matthewparas2020@u.northwestern.edu) +
+
2024-03-13 15:35:41
+
+

*Thread Reply:* interesting, I think we're running on 13.3 LTS - we also haven't upgraded to the official OL version, still using the patched one that I built

+ + +
@@ -177825,6 +179019,123 @@

Write the data from the source DataFrame to the destination table

+
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 13:28:12
+
+

*Thread Reply:* @Athitya Kumar can you tell us if this resolved your issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Athitya Kumar + (athityakumar@gmail.com) +
+
2024-03-06 01:30:32
+
+

*Thread Reply:* @Michael Robinson - Yup, it's resolved for event types that're already being emitted from OpenLineage - but we have some events like StageCompleted / TaskEnd etc where we don't send events currently, where we'd like to plug-in our CustomFacets

+ +

https://openlineage.slack.com/archives/C01CK9T7HKR/p1709298185120219?thread_ts=1709297395.323109&cid=C01CK9T7HKR

+
+ + +
+ + + } + + Maciej Obuchowski + (https://openlineage.slack.com/team/U01RA9B5GG2) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-06 12:57:53
+
+

*Thread Reply:* @Athitya Kumar can you store the facets somewhere (like OpenLineageContext) and send them with complete event later?

+ + + +
+
+
+
+ + + + +
@@ -177925,12 +179236,12 @@

Write the data from the source DataFrame to the destination table

*Thread Reply:* here is an axample:

- + - - + @@ -178704,12 +180015,12 @@

Write the data from the source DataFrame to the destination table

- + - - + @@ -179063,19 +180374,177 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
Max Zheng (mzheng@plaid.com)
-
2024-02-26 12:52:47
+
2024-02-27 13:26:46
+
+

*Thread Reply:* Seems like its on OpenLineageSparkListener.onJobEnd +```24/02/25 16:12:49 INFO PlanUtils: apply method failed with +java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. +This stopped SparkContext was created at:

+ +

org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) +sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) +sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) +sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) +java.lang.reflect.Constructor.newInstance(Constructor.java:423) +py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) +py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) +py4j.Gateway.invoke(Gateway.java:238) +py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) +py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) +py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) +py4j.ClientServerConnection.run(ClientServerConnection.java:106) +java.lang.Thread.run(Thread.java:750)

+ +

The currently active SparkContext was created at:

+ +

org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) +sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) +sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) +sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) +java.lang.reflect.Constructor.newInstance(Constructor.java:423) +py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) +py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) +py4j.Gateway.invoke(Gateway.java:238) +py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) +py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) +py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) +py4j.ClientServerConnection.run(ClientServerConnection.java:106) +java.lang.Thread.run(Thread.java:750)

+ +
at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:121) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SparkSession.&lt;init&gt;(SparkSession.scala:113) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:962) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SQLContext$.getOrCreate(SQLContext.scala:1023) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.sql.SQLContext.getOrCreate(SQLContext.scala) ~[spark-sql_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.hudi.client.common.HoodieSparkEngineContext.&lt;init&gt;(HoodieSparkEngineContext.java:65) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.SparkHoodieTableFileIndex.&lt;init&gt;(SparkHoodieTableFileIndex.scala:65) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieFileIndex.&lt;init&gt;(HoodieFileIndex.scala:81) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieBaseRelation.fileIndex$lzycompute(HoodieBaseRelation.scala:236) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.HoodieBaseRelation.fileIndex(HoodieBaseRelation.scala:234) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.BaseFileOnlyRelation.toHadoopFsRelation(BaseFileOnlyRelation.scala:153) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource$.resolveBaseFileOnlyRelation(DefaultSource.scala:268) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:232) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:111) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:68) ~[hudi-spark-bundle.jar:0.12.2-amzn-0]
+at io.openlineage.spark.agent.lifecycle.plan.SaveIntoDataSourceCommandVisitor.apply(SaveIntoDataSourceCommandVisitor.java:140) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.plan.SaveIntoDataSourceCommandVisitor.apply(SaveIntoDataSourceCommandVisitor.java:47) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder$1.apply(AbstractQueryPlanDatasetBuilder.java:94) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder$1.apply(AbstractQueryPlanDatasetBuilder.java:85) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.util.PlanUtils.safeApply(PlanUtils.java:279) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.lambda$apply$0(AbstractQueryPlanDatasetBuilder.java:75) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at java.util.Optional.map(Optional.java:215) ~[?:1.8.0_392]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.apply(AbstractQueryPlanDatasetBuilder.java:67) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.api.AbstractQueryPlanDatasetBuilder.apply(AbstractQueryPlanDatasetBuilder.java:39) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.util.PlanUtils.safeApply(PlanUtils.java:279) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$null$23(OpenLineageRunEventBuilder.java:451) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_392]
+at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[?:1.8.0_392]
+at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:1.8.0_392]
+at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) ~[?:1.8.0_392]
+at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313) ~[?:1.8.0_392]
+at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392]
+at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_392]
+at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_392]
+at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_392]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildOutputDatasets(OpenLineageRunEventBuilder.java:410) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:298) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:281) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:259) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.end(SparkSQLExecutionContext.java:257) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:167) ~[io.openlineage_openlineage-spark-1.6.2.jar:1.6.2]
+at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:39) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) ~[scala-library-2.12.15.jar:?]
+at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) ~[scala-library-2.12.15.jar:?]
+at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1447) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.3.1-amzn-0.1.jar:3.3.1-amzn-0.1]
+
+ +

24/02/25 16:13:04 INFO AsyncEventQueue: Process of event SparkListenerJobEnd(23,1708877534168,JobSucceeded) by listener OpenLineageSparkListener took 15.64437991s. +24/02/25 16:13:04 ERROR JniBasedUnixGroupsMapping: error looking up the name of group 1001: No such file or directory```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 19:20:10
-

Lastly, would disabling facets improve performance? eg. disabling spark.logicalPlan

+

*Thread Reply:* Hmm yeah I'm confused, https://github.com/OpenLineage/OpenLineage/blob/1.6.2/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PlanUtils.java#L277 seems to indicate as you said (safeApply swallows the exception), but the job exits after on an error code (EMR marks the job as failed)

+ +

The crash stops if I remove spark.stop() or disable the OpenLineage listener so this is odd 🤔

+
+ + + + + + + + + + + + + + + + +
@@ -179089,7 +180558,7 @@

Write the data from the source DataFrame to the destination table

-
+
@@ -179099,15 +180568,56 @@

Write the data from the source DataFrame to the destination table

Paweł Leszczyński (pawel.leszczynski@getindata.com)
-
2024-02-27 02:26:44
+
2024-02-28 04:21:31
-

*Thread Reply:* Disabling spark.LogicalPlan may improve performance of populating OL event. It's disabled by default in recent version (the one released yesterday). You can also use circuit breaker feature if you are worried about Ol integration affecting Spark jobs

+

*Thread Reply:* 24/02/25 16:12:49 INFO PlanUtils: apply method failed with -> yeah, log level is info. It would look as if you were trying to run some action after stopping spark, but you said that disabling OpenLineage listener makes it succeed. This is odd.

-
- 🤩 Yannick Libert -
+
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-28 13:11:11
+
+

*Thread Reply:* Maybe its some race condition on shutdown logic with event listeners? It seems like the listener being enabled is causing executors to be spun up (which fails) after the Spark session is already stopped

+ +

• After the stacktrace above I see ConsoleTransport log some OpenLineage event data +• Then oddly it looks like a bunch of executors are launched after the Spark session has already been stopped +• These executors crash on startup which is likely whats causing the Spark job to exit with an error code +24/02/24 07:18:03 INFO ConsoleTransport: {"eventTime":"2024_02_24T07:17:05.344Z","producer":"<https://github.com/OpenLineage/OpenLineage/tree/1.6.2/integration/spark>", +... +24/02/24 07:18:06 INFO YarnAllocator: Will request 1 executor container(s) for ResourceProfile Id: 0, each with 4 core(s) and 27136 MB memory. with custom resources: &lt;memory:27136, max memory:2147483647, vCores:4, max vCores:2147483647&gt; +24/02/24 07:18:06 INFO YarnAllocator: Submitted 1 unlocalized container requests. +24/02/24 07:18:09 INFO YarnAllocator: Launching container container_1708758297553_0001_01_000004 on host {ip} for executor with ID 3 for ResourceProfile Id 0 with resources &lt;memory:27136, vCores:4&gt; +24/02/24 07:18:09 INFO YarnAllocator: Launching executor with 21708m of heap (plus 5428m overhead/off heap) and 4 cores +24/02/24 07:18:09 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. +24/02/24 07:18:09 INFO YarnAllocator: Completed container container_1708758297553_0001_01_000003 on host: {ip} (state: COMPLETE, exit status: 1) +24/02/24 07:18:09 WARN YarnAllocator: Container from a bad node: container_1708758297553_0001_01_000003 on host: {ip}. Exit status: 1. Diagnostics: [2024-02-24 07:18:06.508]Exception from container-launch. +Container id: container_1708758297553_0001_01_000003 +Exit code: 1 +Exception message: Launch container failed +Shell error output: Nonzero exit code=1, error message='Invalid argument number' +The new executors all fail with: +Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find endpoint: <spark://CoarseGrainedScheduler>@{ip}:{port}

+ +
@@ -179119,19 +180629,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Yannick Libert - (yannick.libert.partner@decathlon.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-27 05:20:13
+
2024-02-28 13:44:20
-

*Thread Reply:* This feature is going to be so useful for us! Love it!

+

*Thread Reply:* The debug logs from AsyncEventQueue show OpenLineageSparkListener took 21.301411402s fwiw - I'm assuming thats abnormally long

@@ -179145,51 +180655,334 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
-
Michael Robinson - (michael.robinson@astronomer.io) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 14:23:37
+
2024-02-28 16:07:37
-

@channel -We released OpenLineage 1.9.1, featuring: -• Airflow: add support for JobTypeJobFacet properties #2412 @mattiabertorello -• dbt: add support for JobTypeJobFacet properties #2411 @mattiabertorello -• Flink: support Flink Kafka dynamic source and sink #2417 @HuangZhenQiu -• Flink: support multi-topic Kafka Sink #2372 @pawel-big-lebowski -• Flink: support lineage for JDBC connector #2436 @HuangZhenQiu -• Flink: add common config gradle plugin #2461 @HuangZhenQiu -• Java: extend circuit breaker loaded with ServiceLoader #2435 @pawel-big-lebowski -• Spark: integration now emits intermediate, application level events wrapping entire job execution #2371 @mobuchowski -• Spark: support built-in lineage within DataSourceV2Relation #2394 @pawel-big-lebowski -• Spark: add support for JobTypeJobFacet properties #2410 @mattiabertorello -• Spark: stop sending spark.LogicalPlan facet by default #2433 @pawel-big-lebowski -• Spark/Flink/Java: circuit breaker #2407 @pawel-big-lebowski -• Spark: add the capability to publish Scala 2.12 and 2.13 variants of openlineage-spark #2446 @d-m-h -A large number of changes and bug fixes were also included. -Thanks to all our contributors with a special shout-out to @Damien Hawes, who contributed >10 PRs to this release! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.9.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.8.0...1.9.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

+

*Thread Reply:* The yarn logs also seem to indicate the listener is somehow causing the app to start up again +2024-02-24 07:18:00,152 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (SchedulerEventDispatcher:Event Processor): container_1708758297553_0001_01_000002 Container Transitioned from RUNNING to COMPLETED +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator (SchedulerEventDispatcher:Event Processor): assignedContainer application attempt=appattempt_1708758297553_0001_000001 container=null queue=default clusterResource=&lt;memory:54272, vCores:8&gt; type=OFF_SWITCH requestedPartition= +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo (SchedulerEventDispatcher:Event Processor): Allocate Updates PendingContainers: 2 Decremented by: 1 SchedulerRequestKey{priority=0, allocationRequestId=0, containerToUpdate=null} for: appattempt_1708758297553_0001_000001 +2024-02-24 07:18:00,155 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (SchedulerEventDispatcher:Event Processor): container_1708758297553_0001_01_000003 Container Transitioned from NEW to ALLOCATED +Is there some logic in the listener that can create a Spark session if there is no active session?

-
- 🚀 Jakub Dardziński, Jackson Goerner, Abdallah, Yannick Libert, Mattia Bertorello, Tristan GUEZENNEC -CROIX-, Fabio Manganiello +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-02-29 03:29:40
+
+

*Thread Reply:* not sure of this, I couldn't find any place of that in code

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:36:43
+
+

*Thread Reply:* Probably another instance when doing something generic does not work with Hudi well 😶

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-29 12:44:24
+
+

*Thread Reply:* Dumb question, what info needs to be fetched from Hudi? Is this in the createRelation call? I'm surprised the logs seem to indicate Hudi table metadata seems to be being read from S3 in the listener

+ +

What would need to be implemented for proper Hudi support?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 15:06:42
+
+

*Thread Reply:* @Max Zheng well, basically we need at least proper name and namespace for the dataset. How we do that is completely dependent on the underlying code, so probably somewhere here: https://github.com/apache/hudi/blob/3a97b01c0263c4790ffa958b865c682f40b4ada4/hudi-[…]-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala

+ +

Most likely we don't need to do any external calls or read anything from S3. It's just done because without something that understands Hudi classes we just do the generic thing (createRelation) that has the biggest chance to work.

+ +

For example, for Iceberg we can get the data required just by getting config from their catalog config - and I think with Hudi it has to work the same way, because logically - if you're reading some table, you have to know where it is or how it's named.

+
+ + + + + + + + + + + + + + + +
-
- 🎉 Abdallah, Mattia Bertorello + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-29 16:05:07
+
+

*Thread Reply:* That makes sense, and that info is in the hoodie.properties file that seems to be loaded based on the logs. But the events I see OL generate seem to have S3 path and S3 bucket as a the name and namespace respectively - ie. it doesn't seem to be using any of the metadata being read from Hudi? +"outputs": [ + { + "namespace": "s3://{bucket}", + "name": "{S3 prefix path}", +(we'd be perfectly happy with just the S3 path/bucket - is there a way to disable createRelation or have OL treat these Hudi as raw parquet?)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-05 05:58:14
+
+

*Thread Reply:* > But the events I see OL generate seem to have S3 path and S3 bucket as a the name and namespace respectively - ie. it doesn't seem to be using any of the metadata being read from Hudi? +Probably yes - as I've said, the OL handling of it is just inefficient and not specific to Hudi. It's good enought that they generate something that seems to be valid dataset naming 🙂 +And, the fact it reads S3 metadata is not intended - it's just that Hudi implements createRelation this way.

+ +
+

(we'd be perfectly happy with just the S3 path/bucket - is there a way to disable createRelation or have OL treat these Hudi as raw parquet?) + The way OpenLineage Spark integration works is by looking at Optimized Logical Plan of particular Spark job. So the solution would be to implement Hudi specific path in SaveIntoDataSourceCommandVisitor or any particular other visitor that touches on the Hudi path - or, if Hudi has their own LogicalPlan nodes, implement support for it.

+
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-05 08:50:14
+
+

*Thread Reply:* (sorry for answering that late @Max Zheng, I thought I had the response send and it was sitting in my draft for few days 😞 )

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-06 19:37:32
+
+

*Thread Reply:* Thanks for the explanation @Maciej Obuchowski

+ +

I've been digging into the source code to see if I can help contribute Hudi support for OL. At least in SaveIntoDataSourceCommandVisitor it seems all I need to do is: +```--- a/integration/spark/shared/src/main/java/io/openlineage/spark/agent/lifecycle/plan/SaveIntoDataSourceCommandVisitor.java ++++ b/integration/spark/shared/src/main/java/io/openlineage/spark/agent/lifecycle/plan/SaveIntoDataSourceCommandVisitor.java +@@ -114,8 +114,9 @@ public class SaveIntoDataSourceCommandVisitor + LifecycleStateChange lifecycleStateChange = + (SaveMode.Overwrite == command.mode()) ? OVERWRITE : CREATE;

  • if (command.dataSource().getClass().getName().contains("DeltaDataSource")) {
  • if (command.dataSource().getClass().getName().contains("DeltaDataSource") || command.dataSource().getClass().getName().contains("org.apache.hudi.Spark32PlusDefaultSource")) { +if (command.options().contains("path")) {
  • log.info("Delta/Hudi data source detected, path: {}", command.options().get("path").get()); + URI uri = URI.create(command.options().get("path").get()); + return Collections.singletonList( + outputDataset() +@@ -123,6 +124,7 @@ public class SaveIntoDataSourceCommandVisitor + } +}`` +This seems to work and avoids thecreateRelation` call but I still run into the same crash 🤔 so now I'm not sure if this is a Hudi issue. Do you know of any other dependencies on the output data source? I wonder if https://openlineage.slack.com/archives/C01CK9T7HKR/p1708671958295659 rdd events could be the culprit?
  • +
+ +

I'm going to try and reproduce the crash without Hudi and just with parquet

+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + +
+ +
@@ -179200,26 +180993,191 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 14:33:27
+
2024-03-06 20:24:14
-

*Thread Reply:* Oudstanding work @Damien Hawes 👏

+

*Thread Reply:* Hmm reading over RDDExecutionContext it seems highly unlikely anything in that would cause this crash

-
- ➕ Michael Robinson, Mattia Bertorello, Fabio Manganiello +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:53:44
+
+

*Thread Reply:* There might be other part related to reading from Hudi?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:54:22
+
+

*Thread Reply:* SaveIntoDataSourceCommandVisitor only takes care about root node of whole LogicalPlan

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:57:51
+
+

*Thread Reply:* I would serialize logical plan and take a look at leaf nodes of the job that causes hang

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 04:58:05
+
+

*Thread Reply:* for simple check you can just make the dataset handler that handles them return early

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 11:54:39
+
+

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1708544898883449?thread_ts=1708541527.152859&cid=C01CK9T7HKR the parsed logical plan for my test job is just the SaveIntoDataSourceCommandVisitor(though I might be mis-understanding what you mean by leaf nodes)

+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + +
+ +
@@ -179230,19 +181188,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Abdallah - (abdallah@terrab.me) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-27 00:39:29
+
2024-03-07 12:12:28
-

*Thread Reply:* Thank you 👏👏

+

*Thread Reply:* I was able to reproduce the issue with InsertIntoHadoopFsRelationCommand with aparquet write with the same job - I'm starting to suspect this is a Spark with Docker/yarn bug

@@ -179256,23 +181214,19 @@

Write the data from the source DataFrame to the destination table

-
+
- + -
+
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-26 15:04:33
+
2024-03-07 13:17:19
-

Hi all, I'm working on a local Airflow-OpenLineage-Marquez integration using Airflow 2.7.3 and python 3.10. Everything seems to be installed correctly with the appropriate settings. I'm seeing events, jobs, tasks trickle into the UI. I'm using the PostgresOperator. When it's time for the SQL code to be parsed, I'm seeing the following in my Airflow logs: -[2024-02-26, 19:43:17 UTC] {sql.py:457} INFO - Running statement: SELECT CURRENT_SCHEMA;, parameters: None -[2024-02-26, 19:43:17 UTC] {base.py:152} WARNING - OpenLineage provider method failed to extract data from provider. -[2024-02-26, 19:43:17 UTC] {manager.py:198} WARNING - Extractor returns non-valid metadata: None -Can anyone give me pointers on why exactly this might be happening? I've tried also with the SQLExecuteQueryOperator, same result. I previously got a Marquez setup to work with the external OpenLineage package for Airflow with Airflow 2.6.1. But I'm struggling with this newer integrated OpenLineage version

+

*Thread Reply:* Without hudi read?

@@ -179286,21 +181240,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:10:21
+
2024-03-07 13:17:46
-

*Thread Reply:* Does this happen for some particular SQL but works for other? -Also, my understanding is that it worked with openlineage-airflow on Airflow 2.6.1 (the same code)? -What version of OL provider are you using?

+

*Thread Reply:* Yep, it reads json and writes out as parquet

@@ -179314,23 +181266,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:20:22
+
2024-03-07 13:18:27
-

*Thread Reply:* I've been using one toy DAG and have only tried with the two operators mentioned. Currently, my team's code doesn't use provider operators so it would not really work well with OL.

- -

Yes, it worked with Airflow 2.6.1. Same code.

- -

Right now, I'm using apache-airflow-providers-openlineage==1.5.0 and the other OL dependencies are at 1.9.1.

+

*Thread Reply:* We're with EMR so I created an AWS support ticket to ask whether this is a known issue with YARN/Spark on Docker

@@ -179344,19 +181292,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Jakub Dardziński - (jakub.dardzinski@getindata.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-26 15:21:00
+
2024-03-07 13:19:53
-

*Thread Reply:* Would you want to share the SQL statement?

+

*Thread Reply:* Very interesting, would be great to see if we see more data in the metrics in the next release

@@ -179370,30 +181318,19 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Max Zheng + (mzheng@plaid.com)
-
2024-02-26 15:31:42
+
2024-03-07 13:21:17
-

*Thread Reply:* It has some PII in it, but it's basically in the form of: -```DROP TABLE IF EXISTS usersmeral.keyrelations;

- -

CREATE TABLE usersmeral.keyrelations AS

- -

WITH -staff AS ( SELECT ...) -,enabled AS (SELECT ...) -SELECT ... -FROM public.borrowers -LEFT JOIN ...;``` -We're splitting the query with sqlparse.split() and feed it to a PostgresOperator.

+

*Thread Reply:* For sure, if its on master or if you have a patch I can build the jar and run my job with it if that'd be helpful

@@ -179407,31 +181344,246 @@

Write the data from the source DataFrame to the destination table

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-27 09:26:41
+
2024-03-07 13:22:04
-

*Thread Reply:* I thought I should share our configs in case I'm missing something: -```[openlineage] -disabled = False -disabledforoperators =

+

*Thread Reply:* Not yet 😶

+ + + +
+ 🙏 Max Zheng +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 20:20:14
+
+

*Thread Reply:* After even more investigation I think I found the cause. In https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java a SparkListenerSQLExecutionEnd event is processed after the SparkSession is stopped - I believe createSparkSQLExecutionContext is doing something weird in https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]n/java/io/openlineage/spark/agent/lifecycle/ContextFactory.java at +SparkSession sparkSession = queryExecution.sparkSession(); +I'm not sure if this is defined behavior for the session to be accessed after its stopped? After I skipped the event in onOtherEvent if the session is stopped it no longer crashes trying to spin up new executors

-

namespace =

+

(I can make a Github issue + try to land a patch if you agree this seems like a bug)

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 21:27:14
+
+

*Thread Reply:* (it might affect all events and this is just the first hit)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 05:55:27
+
+

*Thread Reply:* @Max Zheng is the job particularly short lived? We've seen some times when for very short jobs we had the SparkSession stopped (especially if people close it manually) but it never led to any problems like this deadlock.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:20:12
+
+

*Thread Reply:* I don't think job duration is related (also its not a deadlock, its causing the app to crash https://openlineage.slack.com/archives/C01CK9T7HKR/p1709143871823659?thread_ts=1708969888.804979&cid=C01CK9T7HKR) - it failed for ~ 1 hour long job and when testing still failed when I sampled the job input with df.limit(10000). It seems like it happens on jobs where events take a long time to process (like > 20s in the other thread).

-

extractors =

+

I added this block to verify its being processed after the Spark context is stopped and to skip

-

config_path = /opt/airflow/openlineage.yml -transport =

+

```+ private boolean isSparkContextStopped() {

  • return asJavaOptional(SparkSession.getDefaultSession()
  • .map(sparkContextFromSession)
  • .orElse(activeSparkContext))
  • .map(
  • ctx -> {
  • return ctx.isStopped();
  • })
  • .orElse(true); // If for some reason we can't get the Spark context, we assume it's stopped
  • } ++ +@Override +public void onOtherEvent(SparkListenerEvent event) { + if (isDisabled) { + return; + }
  • if (isSparkContextStopped()) {
  • log.warn("SparkContext is stopped, skipping event: {}", event.getClass());
  • return;
  • } +This logs and no longer causes the same app to crash +24/03/12 04:57:14 WARN OpenLineageSparkListener: SparkSession is stopped, skipping event: class org.apache.spark.sql.execution.ui.SparkListenerDriverAccumUpdates```
  • +
+
+ + +
+ + + } + + Max Zheng + (https://openlineage.slack.com/team/U06L217224C) +
+ + + + + + + + + + + + + + + + + +
@@ -179445,22 +181597,18891 @@

disablesourcecode = ```

-
+
- +
-
Derya Meral - (drderyameral@gmail.com) +
Maciej Obuchowski + (maciej.obuchowski@getindata.com)
-
2024-02-27 09:27:20
+
2024-03-12 12:29:34
-

*Thread Reply:* The YAML file: -transport: - type: http - url: <http://marquez:5000>

+

*Thread Reply:* might the crash be related to memory issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:29:48
+
+

*Thread Reply:* ah, I see

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:31:30
+
+

*Thread Reply:* another question, are you explicitely stopping the sparksession/sparkcontext from within your job?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:31:47
+
+

*Thread Reply:* Yep, it only happens where we explicitly stop with spark.stop()

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-13 16:18:23
+
+

*Thread Reply:* Created: https://github.com/OpenLineage/OpenLineage/issues/2513

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-26 12:52:47
+
+

Lastly, would disabling facets improve performance? eg. disabling spark.logicalPlan

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-02-27 02:26:44
+
+

*Thread Reply:* Disabling spark.LogicalPlan may improve performance of populating OL event. It's disabled by default in recent version (the one released yesterday). You can also use circuit breaker feature if you are worried about Ol integration affecting Spark jobs

+ + + +
+ 🤩 Yannick Libert +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Yannick Libert + (yannick.libert.partner@decathlon.com) +
+
2024-02-27 05:20:13
+
+

*Thread Reply:* This feature is going to be so useful for us! Love it!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-26 14:23:37
+
+

@channel +We released OpenLineage 1.9.1, featuring: +• Airflow: add support for JobTypeJobFacet properties #2412 @mattiabertorello +• dbt: add support for JobTypeJobFacet properties #2411 @mattiabertorello +• Flink: support Flink Kafka dynamic source and sink #2417 @HuangZhenQiu +• Flink: support multi-topic Kafka Sink #2372 @pawel-big-lebowski +• Flink: support lineage for JDBC connector #2436 @HuangZhenQiu +• Flink: add common config gradle plugin #2461 @HuangZhenQiu +• Java: extend circuit breaker loaded with ServiceLoader #2435 @pawel-big-lebowski +• Spark: integration now emits intermediate, application level events wrapping entire job execution #2371 @mobuchowski +• Spark: support built-in lineage within DataSourceV2Relation #2394 @pawel-big-lebowski +• Spark: add support for JobTypeJobFacet properties #2410 @mattiabertorello +• Spark: stop sending spark.LogicalPlan facet by default #2433 @pawel-big-lebowski +• Spark/Flink/Java: circuit breaker #2407 @pawel-big-lebowski +• Spark: add the capability to publish Scala 2.12 and 2.13 variants of openlineage-spark #2446 @d-m-h +A large number of changes and bug fixes were also included. +Thanks to all our contributors with a special shout-out to @Damien Hawes, who contributed >10 PRs to this release! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.9.1 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.8.0...1.9.1 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🚀 Jakub Dardziński, Jackson Goerner, Abdallah, Yannick Libert, Mattia Bertorello, Tristan GUEZENNEC -CROIX-, Fabio Manganiello, Maciej Obuchowski +
+ +
+ 🎉 Abdallah, Mattia Bertorello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 14:33:27
+
+

*Thread Reply:* Oudstanding work @Damien Hawes 👏

+ + + +
+ ➕ Michael Robinson, Mattia Bertorello, Fabio Manganiello, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-02-27 00:39:29
+
+

*Thread Reply:* Thank you 👏👏

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
ldacey + (lance.dacey2@sutherlandglobal.com) +
+
2024-02-27 11:02:19
+
+

*Thread Reply:* any idea how OL releases tie into the airflow provider?

+ +

I assume that a separate apache-airflow-providers-airflow release would be made in the future to incorporate the new features/fixes?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-27 11:05:55
+
+

*Thread Reply:* yes, Airflow providers are released on behalf of Airflow community and different than Airflow core release

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:24:57
+
+

*Thread Reply:* It seems like OpenLineage Spark is still on 1.8.0? Any idea when this will be updated? Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-02-27 15:29:28
+
+

*Thread Reply:* @Max Zheng https://openlineage.io/docs/integrations/spark/#how-to-use-the-integration

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:30:14
+
+

*Thread Reply:* Oh got it, didn't see the note +The above necessitates a change in the artifact identifier for io.openlineage:openlineage-spark. After version 1.8.0, the artifact identifier has been updated. For subsequent versions, utilize: io.openlineage:openlineage_spark_${SCALA_BINARY_VERSION}:${OPENLINEAGE_SPARK_VERSION}.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-02-27 15:30:18
+
+

*Thread Reply:* Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-02-27 15:30:36
+
+

*Thread Reply:* You're welcome.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:04:33
+
+

Hi all, I'm working on a local Airflow-OpenLineage-Marquez integration using Airflow 2.7.3 and python 3.10. Everything seems to be installed correctly with the appropriate settings. I'm seeing events, jobs, tasks trickle into the UI. I'm using the PostgresOperator. When it's time for the SQL code to be parsed, I'm seeing the following in my Airflow logs: +[2024-02-26, 19:43:17 UTC] {sql.py:457} INFO - Running statement: SELECT CURRENT_SCHEMA;, parameters: None +[2024-02-26, 19:43:17 UTC] {base.py:152} WARNING - OpenLineage provider method failed to extract data from provider. +[2024-02-26, 19:43:17 UTC] {manager.py:198} WARNING - Extractor returns non-valid metadata: None +Can anyone give me pointers on why exactly this might be happening? I've tried also with the SQLExecuteQueryOperator, same result. I previously got a Marquez setup to work with the external OpenLineage package for Airflow with Airflow 2.6.1. But I'm struggling with this newer integrated OpenLineage version

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 15:10:21
+
+

*Thread Reply:* Does this happen for some particular SQL but works for other? +Also, my understanding is that it worked with openlineage-airflow on Airflow 2.6.1 (the same code)? +What version of OL provider are you using?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:20:22
+
+

*Thread Reply:* I've been using one toy DAG and have only tried with the two operators mentioned. Currently, my team's code doesn't use provider operators so it would not really work well with OL.

+ +

Yes, it worked with Airflow 2.6.1. Same code.

+ +

Right now, I'm using apache-airflow-providers-openlineage==1.5.0 and the other OL dependencies are at 1.9.1.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-02-26 15:21:00
+
+

*Thread Reply:* Would you want to share the SQL statement?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-26 15:31:42
+
+

*Thread Reply:* It has some PII in it, but it's basically in the form of: +```DROP TABLE IF EXISTS usersmeral.keyrelations;

+ +

CREATE TABLE usersmeral.keyrelations AS

+ +

WITH +staff AS ( SELECT ...) +,enabled AS (SELECT ...) +SELECT ... +FROM public.borrowers +LEFT JOIN ...;``` +We're splitting the query with sqlparse.split() and feed it to a PostgresOperator.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-27 09:26:41
+
+

*Thread Reply:* I thought I should share our configs in case I'm missing something: +```[openlineage] +disabled = False +disabledforoperators =

+ +

namespace =

+ +

extractors =

+ +

config_path = /opt/airflow/openlineage.yml +transport =

+ +

disablesourcecode = ```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-02-27 09:27:20
+
+

*Thread Reply:* The YAML file: +transport: + type: http + url: <http://marquez:5000>

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-04 13:01:19
+
+

*Thread Reply:* Are you running on apple silicon?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Derya Meral + (drderyameral@gmail.com) +
+
2024-03-04 15:39:05
+
+

*Thread Reply:* Yep, is that the issue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 13:00:00
+
+

@channel +Since lineage will be the focus of a panel at Data Council Austin next month, it seems like a great opportunity to organize a meetup. Please get in touch if you might be interested in attending, presenting or hosting!

+
+
datacouncil.ai
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ✅ Sheeri Cabral (Collibra), Jarek Potiuk, Howard Yoo +
+ +
+ ❤️ Harel Shein, Julian LaNeve, Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-02-28 14:37:16
+
+

Hi all, I'm running into an unusual issue with OpenLineage on Databricks. When using OL 1.4.1 on a cluster that runs over 100 jobs every 30 minutes. After a couple hours, a DRIVER_NOT_RESPONDING error starts showing up in the event log with the message Driver is up but is not responsive, likely due to GC.. After a DRIVER_HEALTHY the error occurs again several minutes later. Is this a known issue that has been solved in a later release, or is there something I can do in Databricks to stop this?

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 05:27:20
+
+

*Thread Reply:* My guess would be that with that amount of jobs scheduled shortly the SparkListener queue grows and some internal healthcheck times out?

+ +

Maybe you could try disabling spark.logicalPlan and spark_unknown facets to see if this speeds things up.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-02-29 09:42:27
+
+

*Thread Reply:* BTW, are you receiving OL events in the meantime?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 12:55:50
+
+

*Thread Reply:* Hi @Declan Grant, can you tell us if disabling the facets worked?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-03-04 14:30:14
+
+

*Thread Reply:* We had already tried disabling the facets, and that did not solve the issue.

+ +

Here is the relevant spark config: +spark.openlineage.transport.type console +spark.openlineage.facets.disabled [spark_unknown;spark.logicalPlan;schema;columnLineage;dataSource] +We are not interested in column lineage at this time.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Declan Grant + (declan.grant@sdktek.com) +
+
2024-03-04 14:31:28
+
+

*Thread Reply:* OL has been uninstalled from the cluster, so I can't immediately say whether events are received while the driver is not responding.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-02-28 15:19:51
+
+

@channel +This month's issue of OpenLineage News is in inboxes now! Sign up to ensure you always get the latest issue. In this edition: a rundown of open issues, new docs and new videos, plus updates on the Airflow Provider, Spark integration and Flink integration (+ more).

+
+
openlineage.us14.list-manage.com
+ + + + + + + + + + + + + + + +
+ + + +
+ 👍 Mattia Bertorello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Simran Suri + (mailsimransuri@gmail.com) +
+
2024-03-01 01:19:04
+
+

Hi all, I've been trying to gather clues on how OpenLineage fetches our inputs' namespace and name from our Spark codebase. Routing to the exact logic would be very helpful for one of my usecase.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-01 02:25:10
+
+

*Thread Reply:* There is no single place where the namespace is assigned to dataset as this is strictly dependending on what datasets are read. Spark, as other OpenLineage integrations, follows the naming convention -> https://openlineage.io/docs/spec/naming

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-01 04:42:12
+
+

Hi all, I'm working on propagating the parent facet from an Airflow DAG to the dbt workflows it launches, and I'm a bit puzzled by the current logic in lineageparentid. It generates an ID in the form namespace/name/run_id (which is the format that dbt-ol expects as well), but here name is actually a UUID generated from the job's metadata, and run_id is the internal Airflow task instance name (usually a concatenation of execution date + try number) instead of a UUID, like OpenLineage advises.

+ +

Instead of using this function I've made my own where name=<dag_id>.<task_id> (as this is the job name propagated in other OpenLineage events as well), and run_id = lineage_run_id(operator, task_instance) - basically using the UUID hashing logic for the run_id that is currently used for the name instead. This seems to be more OpenLineage-compliant and it allows us to link things properly.

+ +

Is there some reason that I'm missing behind the current logic? Things are even more confusing IMHO because there's also a newlineagerun_id utility that calculates the run_id simply as a random UUID, without the UUID serialization logic of lineage_run_id, so it's not clear which one I'm supposed to use.

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-01 05:52:28
+
+

*Thread Reply:* FYI the function I've come up with to link things properly looks like this:

+ +

```from airflow.models import BaseOperator, TaskInstance +from openlineage.airflow.macros import JOBNAMESPACE +from openlineage.airflow.plugin import lineagerunid

+ +

def lineageparentid(self: BaseOperator, taskinstance: TaskInstance) -> str: + return "/".join( + [ + _JOBNAMESPACE, + f"{taskinstance.dagid}.{taskinstance.taskid}", + lineagerunid(self, task_instance), + ] + )```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-04 04:19:39
+
+

*Thread Reply:* @Paweł Leszczyński @Jakub Dardziński - any thoughts here?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-04 05:12:15
+
+

*Thread Reply:* newlineagerun_id is some very old util method that should be deleted imho

+ +

I agree what you propose is more OL-compliant. Indeed, what we have in Airflow provider for dbt cloud integration is pretty the same you have: +https://github.com/apache/airflow/blob/main/airflow/providers/dbt/cloud/utils/openlineage.py#L132

+ +

the reason for that is I think that the logic was a subject of change over time and dbt-ol script just was not updated properly

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👍 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 12:53:44
+
+

*Thread Reply:* @Fabio Manganiello would you mind opening an issue about this on GitHub?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-04 12:54:14
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2488 +there is one already 🙂 @Fabio Manganiello thank you for that!

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-04 13:05:13
+
+

*Thread Reply:* Oops, should have checked first! Yes, thanks Fabio

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-04 13:19:50
+
+

*Thread Reply:* There is also a PR already, sent as separate message by @Fabio Manganiello. And the same fix for the provider here. Some discussion is needed about what changes can we made to the macros and whether they will be "breaking", so feel free to comment.

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Honey Thakuria + (Honey_Thakuria@intuit.com) +
+
2024-03-01 07:49:55
+
+

Hey team, +we're trying to extract certain Spark metrics with OL using custom Facets.

+ +

But we're not getting SparkListenerTaskStart , SparkListenerTaskEnd event as part of custom facet.

+ +

We're only able to get SparkListenerJobStart, SparkListenerJobEnd, SparkListenerSQLExecutionStart, SparkListenerSQLExecutionEnd.

+ +

This is how our custom facet code looks like : +``` @Override + protected void build(SparkListenerEvent event, BiConsumer<String, ? super TestRunFacet> consumer) { + if (event instanceof SparkListenerSQLExecutionStart) { ...} +if (event instanceof SparkListenerTaskStart) { ...}

+ +

} +But when we're executing the same Spark SQL using custom listener without OL facets, we're able to get Task level metrics too: +public class IntuitSparkMetricsListener extends SparkListener { + @Override + public void onJobStart(SparkListenerJobStart jobStart){ + log.info("job start logging starts"); + log.info(jobStart.toString());

+ +
}
+
+
+@Override
+public void onTaskEnd(SparkListenerTaskEnd taskEnd) {
+
+ +

} +.... +}``` +Could anyone give us certain input on how to get Task level metrics in OL facet itself ? +Also, any issue due to SparkListenerEvent vs SparkListener ?

+ +

cc @Athitya Kumar @Kiran Hiremath

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-01 08:00:09
+
+

*Thread Reply:* OpenLineageSparkListener is not listening on SparkListenerTaskStart at all. It listens to SparkListenerTaskEnd , but only to fill metrics for OutputStatisticsOutputDatasetFacet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-01 08:03:05
+
+

*Thread Reply:* I think to do this would be a not that small change, you'd need to add handling for those methods for ExecutionContexts https://github.com/OpenLineage/OpenLineage/blob/31f8ce588526e9c7c4bc7d849699cb7ce2[…]java/io/openlineage/spark/agent/lifecycle/ExecutionContext.java and OpenLineageSparkListener itself to pass it forward.

+ +

When it comes to implementation of them in particular contexts, I would make sure they don't emit unless you have something concrete set up for them, like those metrics you've set up.

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-03-04 06:57:09
+
+

Hi folks, I have created a PR to address the required changes in the Airflow lineage_parent_id macro, as discussed in my previous comment (cc @Jakub Dardziński @Damien Hawes @Mattia Bertorello)

+
+ + +
+ + + } + + Fabio Manganiello + (https://openlineage.slack.com/team/U06BV4F12JU) +
+ + + + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Labels
+ integration/airflow +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ 👀 Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-13 14:10:46
+
+

*Thread Reply:* Hey Fabio, thanks for the PR. Please let us know if you need any help with fixing tests.

+ + + +
+ 🙌 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-06 15:22:46
+
+

@channel +This month’s TSC meeting is next week on a new day/time: Wednesday the 13th at 9:30am PT. Please note that this will be the new day/time going forward! +On the tentative agenda: +• announcements + ◦ new integrations: DataHub and OpenMetadata + ◦ upcoming events +• recent release 1.9.1 highlights +• Scala 2.13 support in Spark overview by @Damien Hawes +• Circuit breaker in Spark & Flink @Paweł Leszczyński +• discussion items +• open discussion +More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? Reply here or DM me to be added to the agenda.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+ 🙏 Willy Lulciuc +
+ +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-06 19:45:11
+
+

Hi, would it be reasonable to add a flag to skip RUNNING events for the Spark integration? https://openlineage.io/docs/integrations/spark/job-hierarchy For some jobs we're seeing AsyncEventQueue report ~20s to process each event and a lot of RUNNING events being generated

+ +

IMO this might work as an alternative to https://github.com/OpenLineage/OpenLineage/issues/2375 ? It seems like it'd be more valuable to get the START/COMPLETE events vs intermediate RUNNING events

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Labels
+ proposal +
+ +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:13:16
+
+

*Thread Reply:* Well, I think the real problem is 20s event generator. What we should do is to include timer spent on each visitor or dataset builder within debug facet. Once this is done, we could reach out to you again to let you guide us which code part leads to such scenario.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:13:44
+
+

*Thread Reply:* @Maciej Obuchowski do we have an issue for this? I think we discussed it recently.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 11:58:05
+
+

*Thread Reply:* > What we should do is to include timer spent on each visitor or dataset builder within debug facet. +I could help provide this data if that'd be helpful, how/what instrumentation should I add? If you've got a patch handy I could apply it locally, build, and collect this data from my test job

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-07 12:15:42
+
+

*Thread Reply:* Its also taking > 20s per event with parquet writes instead of hudi writes in my job so I don't think thats the culprit

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-07 14:45:59
+
+

*Thread Reply:* I'm working on instrumentation/metrics right now, will be ready for next release 🙂

+ + + +
+ 🙌 Max Zheng, Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-11 20:04:22
+
+

*Thread Reply:* I did some manual timing and 90% of the latency is from buildInputDatasets https://github.com/OpenLineage/OpenLineage/blob/987e5b806dc8bd6c5aab5f85c97af76a87[…]enlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java

+ +

Manual as in I modified: +long startTime = System.nanoTime(); + List&lt;InputDataset&gt; datasets = + Stream.concat( + buildDatasets(nodes, inputDatasetBuilders), + openLineageContext + .getQueryExecution() + .map( + qe -&gt; + ScalaConversionUtils.fromSeq(qe.optimizedPlan().map(inputVisitor)) + .stream() + .flatMap(Collection::stream) + .map(((Class&lt;InputDataset&gt;) InputDataset.class)::cast)) + .orElse(Stream.empty())) + .collect(Collectors.toList()); + long endTime = System.nanoTime(); + double durationInSec = (endTime - startTime) / 1_000_000_000.0; + <a href="http://log.info">log.info</a>("buildInputDatasets 1: {}s", durationInSec); +24/03/11 23:44:58 INFO OpenLineageRunEventBuilder: buildInputDatasets 1: 95.710143007s +Is there anything I can instrument/log to narrow down further why this is so slow? buildOutputDatasets is also kind of slow at ~10s

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 05:57:58
+
+

*Thread Reply:* @Max Zheng it's not extremely easy because sometimes QueryPlanVisitors/DatasetBuilders delegate work to other ones, but I think I'll have a relatively good solution soon: https://github.com/OpenLineage/OpenLineage/pull/2496

+ + + +
+ 👍 Paweł Leszczyński, Max Zheng +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:20:24
+
+

*Thread Reply:* Got it, should I open a Github issue to track this?

+ +

For context the code is +def load_df_with_schema(spark: SparkSession, s3_base: str): + schema = load_schema(spark, s3_base) + file_paths = get_file_paths(spark, "/".join([s3_base, "manifest.json"])) + return spark.read.format("json").load( + file_paths, + schema=schema, + mode="FAILFAST", + ) +And the input schema has ~250 columns

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:24:00
+
+

*Thread Reply:* the instrumentation issues are already there, but please do open issue for the slowness 👍

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-12 12:24:34
+
+

*Thread Reply:* and yes, it can be some degenerated example where we do something way more often than once

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-12 12:25:19
+
+

*Thread Reply:* Got it, I'll try to create a working reproduction and ticket it 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Max Zheng + (mzheng@plaid.com) +
+
2024-03-13 16:18:31
+
+

*Thread Reply:* Created https://github.com/OpenLineage/OpenLineage/issues/2511

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 02:02:43
+
+

Hi team... I am trying to emit openlineage events from a spark job. When I submit the job using spark-submit, this is what I see in console.

+ +

ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception +io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException: Failed to find TransportBuilder (through reference chain: io.openlineage.client.OpenLineageYaml["transport"]) + at io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(OpenLineageClientUtils.java:149) + at io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114) + at io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78) + at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277) + at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:110) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +Caused by: io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException: Failed to find TransportBuilder (through reference chain: io.openlineage.client.OpenLineageYaml["transport"]) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1853) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:316) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3809) + at io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(OpenLineageClientUtils.java:147) + ... 18 more +Caused by: java.lang.IllegalArgumentException: Failed to find TransportBuilder + at io.openlineage.client.transports.TransportResolver.lambda$getTransportBuilder$3(TransportResolver.java:38) + at java.base/java.util.Optional.orElseThrow(Optional.java:403) + at io.openlineage.client.transports.TransportResolver.getTransportBuilder(TransportResolver.java:37) + at io.openlineage.client.transports.TransportResolver.resolveTransportConfigByType(TransportResolver.java:16) + at io.openlineage.client.transports.TransportConfigTypeIdResolver.typeFromId(TransportConfigTypeIdResolver.java:35) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:159) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:151) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:147) + at io.openlineage.spark.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314) + ... 23 more +Can I get any help on this?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 02:20:32
+
+

*Thread Reply:* Looks like misconfigured transport. Please refer to this -> https://openlineage.io/docs/integrations/spark/configuration/transport and https://openlineage.io/docs/integrations/spark/configuration/spark_conf for more details. I think you're missing spark.openlineage.transport.type property.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 02:28:10
+
+

*Thread Reply:* This is my configuration of the transport: +conf.set("sparkscalaversion", "2.12") + conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") + conf.set("spark.openlineage.transport.type","http") + conf.set("spark.openlineage.transport.url","<http://localhost:8082>") + conf.set("spark.openlineage.transport.endpoint","/event") + conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") +During spark-submit if I include +--packages "io.openlineage:openlineage_spark:1.8.0" +I am able to receive events.

+ +

I have already included this line in build.sbt +libraryDependencies += "io.openlineage" % "openlineage-spark" % "1.8.0"

+ +

So I don't understand why I have to pass the packages again

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:07:04
+
+

*Thread Reply:* OK, the configuration is OK. I think that when using libraryDependencies you get rid of manifest from within our JAR which is used by ServiceLoader

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:07:40
+
+

*Thread Reply:* this is happening here -> https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/transports/TransportResolver.java#L32

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:08:37
+
+

*Thread Reply:* And this is the known issue related to this -> https://github.com/OpenLineage/OpenLineage/issues/1860

+
+ + + + + + + +
+
Assignees
+ <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> +
+ +
+
Labels
+ bug, integration/spark +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-07 03:09:47
+
+

*Thread Reply:* This comment -> https://github.com/OpenLineage/OpenLineage/issues/1860#issuecomment-1750536744 explains this and shows how to fix this. I am happy to help new contributors with this.

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-07 03:10:57
+
+

*Thread Reply:* Thanks for the detailed reply and pointers. Will look into it.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 15:56:52
+
+

@channel +The big redesign of Marquez Web is out now following a productive testing period and some modifications along with added features. In addition to a wholesale redesign including column lineage support, it includes a new dataset tagging feature. It's worth checking out as a consumption layer in your lineage solution. A blog post with more details is coming soon, but here are some screenshots to whet your appetite. (See the thread for a screencap of the column lineage display.) +Marquez quickstart: https://marquezproject.ai/docs/quickstart/ +The release itself: https://github.com/MarquezProject/marquez/releases/tag/0.45.0

+ + + + + + +
+ 🤯 Ross Turk, Julien Le Dem, Harel Shein, Juan Luis Cano Rodríguez, Paweł Leszczyński, Mattia Bertorello, Rodrigo Maia +
+ +
+ ❤️ Harel Shein, Peter Huang, Kengo Seki, Paul Wilson Villena, Paweł Leszczyński, Mattia Bertorello, alexandre bergere, Rodrigo Maia, Maciej Obuchowski, Ernie Ostic, Dongjin Seo +
+ +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Cory Visi + (cvisi@amazon.com) +
+
2024-03-07 17:34:18
+
+

*Thread Reply:* Are those field descriptions coming from emitted events? or from a defined schema that's being added by marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ted McFadden + (tmcfadden@consoleconnect.com) +
+
2024-03-07 17:51:42
+
+

*Thread Reply:* Nice work! Are there any examples of the mode being switched from Table level to Column level or do I miss understand what mode is?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:52:11
+
+

*Thread Reply:* @Cory Visi Those are coming from the events. The screenshots are of the UI seeded with metadata. You can find the JSON used for this here: https://github.com/MarquezProject/marquez/blob/main/docker/metadata.json

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:53:38
+
+

*Thread Reply:* The three screencaps in my first message actually don't include the column lineage display feature (but there are lots of other upgrades in the release)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 17:55:56
+
+

*Thread Reply:* column lineage view:

+ + + + +
+ ❤️ Paweł Leszczyński, Rodrigo Maia, Cory Visi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ted McFadden + (tmcfadden@consoleconnect.com) +
+
2024-03-07 18:01:21
+
+

*Thread Reply:* Thanks, that's what I wanted to get a look at. Cheers

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-07 18:01:25
+
+

*Thread Reply:* @Ted McFadden what the initial 3 screencaps show is switching between the graph view and detailed views of the datasets and jobs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
David Sharp + (davidsharp7@gmail.com) +
+
2024-03-07 23:59:42
+
+

*Thread Reply:* Hey with the tagging we’ve identified a slight bug - PR has been put into fix.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-08 05:31:15
+
+

*Thread Reply:* The "query" section looks awesome, Congrats!!! But from the openlineage side, when is the query attribute available?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Cory Visi + (cvisi@amazon.com) +
+
2024-03-08 07:36:29
+
+

*Thread Reply:* Fantastic work!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-08 07:55:30
+
+

*Thread Reply:* @Rodrigo Maia the OpenLineage spec supports this via the SQLJobFacet. See: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/SQLJobFacet.json

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 08:42:40
+
+

*Thread Reply:* Thanks Michael....do we have a list of which providers are known to be populating the SQL JobFacet (assuming that the solution emitting the events uses SQL and has access to it)?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-08 08:59:24
+
+

*Thread Reply:* @Maciej Obuchowski or @Jakub Dardziński can add more detail, but this doc has a list of operators supported by the SQL parser.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:01:13
+
+

*Thread Reply:* yeah, so basically any of the operators that is sql-compatible - SQLExecuteQueryOperator + Athena, BQ I think

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 09:05:45
+
+

*Thread Reply:* Thanks! That helps for Airflow --- do we know if any other Providers are fully supporting this powerful facet?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:07:45
+
+

*Thread Reply:* whoa, powerful 😅 +I just checked sources, the only missing from above is CopyFromExternalStageToSnowflakeOperator

+ +

are you interested in some specific ones?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 09:08:49
+
+

*Thread Reply:* and ofc you can have SQLJobFacet coming from dbt or spark as well or any other systems triggered via Airflow

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 11:03:36
+
+

*Thread Reply:* Thanks Jakub. It will be interesting to know which providers we are certain provide SQL, that are entirely independent of Airflow.

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 11:07:50
+
+

*Thread Reply:* I don’t think we have any facet-oriented docs (e.g. what produces SQLJobFacet) and if that makes sense

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ernie Ostic + (ernie.ostic@getmanta.com) +
+
2024-03-08 11:14:40
+
+

*Thread Reply:* Thanks. Ultimately, it's a bigger question that we've talked about before, about best ways to document and validate what things/facets you can support/consume (as a consumer) or which you support/populate as a provider.

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 11:16:05
+
+

*Thread Reply:* The doc that @Michael Robinson shared is automatically generated from Airflow code, so it should provide the best option for build-in operators. If we're talking about providers/operators outside Airflow repo, then I think @Julien Le Dem’s registry proposal would best support that need

+ + + +
+ ☝️ Jakub Dardziński, Ernie Ostic +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Athitya Kumar + (athityakumar@gmail.com) +
+
2024-03-07 23:44:08
+
+

Hey team. Is column/attribute level lineage supported for input/topic Kafka topic ports in the OpenLineage Flink listener?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 02:07:58
+
+

*Thread Reply:* Column level lineage is currently not supported for Flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-08 04:57:20
+
+

Is it possible to explain me "OTHER" Run State and whether we can use this to send Lineage events to check the health of a service that is running in background and triggered interval manner. +It will be really helpful, if someone can send example JSON for "OTHER" run state

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 05:17:19
+
+

*Thread Reply:* The example idea behind other was: imagine a system that requests for compute resorouces and would like to emit OpenLineage event about request being made. That's why other can occur before start. The other idea was to put other elsewhere to provide agility for new scenarios. However, we want to restrict which event types are terminating ones and don't want other there. This is important for lineage consumers, as when they receive terminating event for a given run, they know all the events related to the run were emitted.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-08 05:38:21
+
+

*Thread Reply:* @Paweł Leszczyński Is it possible to track the health of a service by using OpenLineage Events? Of so, How? +As an example, I have a windows service, and I want to make sure the service is up and running.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-08 05:53:58
+
+

*Thread Reply:* depends on what do you mean by service. If you consider a data processing job as a service, then you can track if it successfully completes.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:08:46
+
+

*Thread Reply:* I think other systems would be more suited for healthchecks, like OpenTelemetry or Datadog

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:22:03
+
+

hey there, trying to configure databricks spark with the openlineage spark listener 🧵

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:22:52
+
+

*Thread Reply:* databricks runtime for clusters: +14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) +we are shipping a global init script that looks like the following: +```#!/bin/bash +VERSION="1.9.1" +SCALAVERSION="2.12" +wget -O /mnt/driver-daemon/jars/openlineage-spark$${SCALAVERSION}-$${VERSION}.jar https://repo1.maven.org/maven2/io/openlineage/openlineage-spark$${SCALAVERSION}/$${VERSION}/openlineage-spark$${SCALA_VERSION}-$${VERSION}.jar

+ +

SPARKDEFAULTSFILE="/databricks/driver/conf/00-openlineage-defaults.conf"

+ +

if [[ $DBISDRIVER = "TRUE" ]]; then + cat > $SPARKDEFAULTSFILE <<- EOF + [driver] { + "spark.extraListeners" = "com.databricks.backend.daemon.driver.DBCEventLoggingListener,io.openlineage.spark.agent.OpenLineageSparkListener" + "spark.openlineage.version" = "v1" + "spark.openlineage.transport.type" = "http" + "spark.openlineage.transport.url" = "https://some.url" + "spark.openlineage.dataset.removePath.pattern" = "(\/[a-z]+[-a-zA-Z0-9]+)+(?<remove>.**)" + "spark.openlineage.namespace" = "some_namespace" + } +EOF +fi``` +with openlineage-spark 1.9.1

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:23:38
+
+

*Thread Reply:* getting fatal exceptions: +24/03/07 14:14:05 ERROR DatabricksMain$DBUncaughtExceptionHandler: Uncaught exception in thread spark-listener-group-shared! +java.lang.NoClassDefFoundError: com/databricks/sdk/scala/dbutils/DbfsUtils + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.getDbfsUtils(DatabricksEnvironmentFacetBuilder.java:124) + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.getDatabricksEnvironmentalAttributes(DatabricksEnvironmentFacetBuilder.java:92) + at io.openlineage.spark.agent.facets.builder.DatabricksEnvironmentFacetBuilder.build(DatabricksEnvironmentFacetBuilder.java:58) +and spark driver crashing when spark runs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:28:43
+
+

*Thread Reply:* browsing the code for 1.9.1 shows that the exception comes from trying to access the class for databricks dbfsutils here

+ +

should I file a bug on github, or am I doing something very wrong here?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:53:00
+
+

*Thread Reply:* Looks like something has changed in the Databricks 14 🤔

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-08 07:53:17
+
+

*Thread Reply:* Issue on GitHub is the right way

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 07:53:49
+
+

*Thread Reply:* thanks, opening one now with this information.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Efthymios Hadjimichael + (ehadjimichael@id5.io) +
+
2024-03-08 09:21:24
+
+

*Thread Reply:* link to issue for anyone interested, thanks again!

+ + + +
+ 👍 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-15 10:09:00
+
+

*Thread Reply:* Hi @Maciej Obuchowski I am having the same issue with older versions of Databricks.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 02:47:30
+
+

*Thread Reply:* I don't think that the spark's integration is working anymore for any of the environments in Databricks and not only the version 14.

+ + + +
+ ➕ Tristan GUEZENNEC -CROIX- +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 05:38:05
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-18 07:14:09
+
+

*Thread Reply:* @Abdallah are you willing to provide PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-18 11:51:20
+
+

*Thread Reply:* I am having a look

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Abdallah + (abdallah@terrab.me) +
+
2024-03-20 04:45:02
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2530

+
+ + + + + + + +
+
Labels
+ integration/spark +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
slackbot + +
+
2024-03-08 12:04:26
+
+

This message was deleted.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:13:32
+
+

*Thread Reply:* is what you sent an event for DAG or task?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:22:32
+
+

*Thread Reply:* so far Marquez cannot show job hierarchy (DAG is parent to tasks) so you need click on some of the tasks in the UI to see proper view

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:33:25
+
+

*Thread Reply:* is this the only job listed?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:33:37
+
+

*Thread Reply:* no, I can see 191 total

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:34:22
+
+

*Thread Reply:* what if you choose any other job that has ACustomingestionDag. prefix?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:39:24
+
+

*Thread Reply:* you also have namespaces in right upper corner. datasets are probably in different namespace than Airflow jobs

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-08 12:47:52
+
+

*Thread Reply:* https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/supported_classes.html

+ +

this is the list of supported operators currently

+ +

not all of them send dataset information, e.g. PythonOperator

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 14:06:35
+
+

hi everyone!

+ +

i configured openlineage + marquez to my Amazon managed Apache Airflow to get better insights of the DAGS. for implementation i followed the https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/ guide, using helm/k8s option. +marquez is up and running i can see my DAGs and depending DAGs in jobs section, however when clicking on any of the dags in jobs list i see only one job without any dependencies. i would like to see the whole chain of tasks execution. how can i achieve this goal? please advice.

+ +

additional information: +we dont have Datasets in our MWAA. +MWAA Airflow - v. 2.7.2 +Openlineage plugin.py - +from airflow.plugins_manager import AirflowPlugin +from airflow.models import Variable +import os

+ +

os.environ["OPENLINEAGEURL"] = Variable.get('OPENLINEAGEURL', default_var='')

+ +

class EnvVarPlugin(AirflowPlugin): + name = "envvarplugin"

+ +

requirements.txt: +httplib2 +urllib3 +oauth2client +bingads +pymssql +certifi +facebook_business +mysql-connector-python +google-api-core +google-auth +google-api-python-client +apiclient +google-auth-httplib2 +google-auth-oauthlib +pymongo +pandas +numpy +pyarrow +apache-airflow-providers-openlineage

+ +

Also, where can i find the meaning of Depth, complete mode, compact nodes options? i believe it is an view option?

+ +

Thank you in advance for your help!

+ +
+ + + + + + + + + +
+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Willy Lulciuc + (willy@datakin.com) +
+
2024-03-08 14:17:50
+
+

*Thread Reply:* Jobs may not have any dependencies depending on the Airflow operator used (ex: PythonOperator). Can you provide the OL events for the job you expect to have inputs/outputs? In the Marquez Web UI, you can use the events tab:

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 14:42:14
+
+

*Thread Reply:* i expect to see dependencies from all my jobs. i was hoping marquez will show similar view as airflow does, and therefore having easier chance to troubleshoot failed DAGs. please refer to the image below.

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-08 17:02:09
+
+

*Thread Reply:* is this what you requested?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 10:19:28
+
+

*Thread Reply:* hello! @Willy Lulciuc could you please guide me further? what can be done to see the whole chain of DAG execution in openlineage/marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 14:42:01
+
+

*Thread Reply:* from textwrap import dedent +import mysql.connector +import pymongo +import logging +import sys +import ast +from airflow import DAG +from airflow.operators.python import PythonOperator +from airflow.operators.trigger_dagrun import TriggerDagRunOperator +from airflow.operators.python import BranchPythonOperator +from airflow.providers.http.operators.http import SimpleHttpOperator +from airflow.models import Variable +from bson.objectid import ObjectId +we do use PythonOperator, however we are specifying task dependencies in the DAG code, example:

+ +

error_task = PythonOperator( +891 task_id='error', +892 python_callable=error, +893 dag=dag, +894 trigger_rule = "one_failed" +895 ) +896 +897 transformed_task >> generate_dict >> api_trigger_dependent_dag >> error_task +for this case is there a way to have detailed view in Marquez Web UI?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Nargiza Fernandez + (nargizafernandez@gmail.com) +
+
2024-03-11 14:50:17
+
+

*Thread Reply:* @Jakub Berezowski hello! could you please take a look at my case and advice what can be done whenever you have time? thank you!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suresh Kumar + (ssureshkumar6@gmail.com) +
+
2024-03-10 04:35:02
+
+

Hi All, +I'm based out of Sydney and we are using the open lineage on Azure data platform. +I'm looking for some direction and support where we got struck currently on lineage creation from Spark (Azure Synapse Analytics) +PySpark not able to emit lineage when there are some complex transformations happening. +The open lineage version we currently using is v0.18 and Spark version is 3.2.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-11 03:54:43
+
+

*Thread Reply:* Hi, could you provide some more details on the issue you are facing? Some debug logs, specific error message, pyspark code that causes the issue? Also, current OpenLineage version is 1.9.1 , is there any reason you are using an outdated 0.18?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suresh Kumar + (ssureshkumar6@gmail.com) +
+
2024-03-11 19:15:18
+
+

*Thread Reply:* Thanks for the headsup. We are in process of upgrading the library and get back to you.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kylychbek Zhumabai uulu + (kylychbekeraliev2000@gmail.com) +
+
2024-03-11 12:51:09
+
+

Hello everyone, is there anyone who integrated AWS MWAA with Openlineage, I'm trying it but it is not working, can you give some ideas and steps if you have an experience for that?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-12 12:37:47
+
+

@channel +This month's TSC meeting, open to all, is tomorrow at 9:30 PT. The updated agenda includes exciting news of new integrations and presentations by @Damien Hawes and @Paweł Leszczyński. Hope to see you there! https://openlineage.slack.com/archives/C01CK9T7HKR/p1709756566788589

+
+ + +
+ + + } + + Michael Robinson + (https://openlineage.slack.com/team/U02LXF3HUN7) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 🚀 Mattia Bertorello, Maciej Obuchowski, Sheeri Cabral (Collibra), Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-13 10:28:41
+
+

Hi team.. If we are trying to send openlineage events from spark job to kafka endpoint which requires keystore and truststore related properties to be configured, how can we configure it?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-13 10:33:48
+
+

*Thread Reply:* Hey, check out this docs and spark.openlineage.transport.properties.[xxx] configuration. Is this what you are looking for?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-13 11:08:49
+
+

*Thread Reply:* Yes... Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-13 11:46:09
+
+

Hello all 👋! +Has anyone tried to use spark udfs with openlineage? +Does it make sense for the column-level lineage to stop working in this context?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-15 08:47:54
+
+

*Thread Reply:* did you investigate if it still works on a table-level?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-15 08:49:50
+
+

*Thread Reply:* (I haven’t tried it, but looking at spark UDFs it looks like there are many differences - https://medium.com/@suffyan.asad1/a-deeper-look-into-spark-user-defined-functions-537c6efc5fb3 - nothing is jumping out at me as “this is why it doesn’t work” though.

+
+
Medium
+ + + + + + +
+
Reading time
+ 10 min read +
+ + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-14 03:49:21
+
+

This week brought us many fixes to the Flink integration like: +• #2507 which resolves critical issues introduced in recent release, +• #2508 which makes JDBC dataset naming consistent with dataset naming convention and having a common code for Spark & Flink to extract dataset identifier from JDBC connection url. +• #2512 which includes database schema in dataset identifier for JDBC integration in Flink. +These are significant improvements and I think they should not wait for the next release cycle. +I would like to start a vote for an immediate release.

+ + + +
+ ➕ Kacper Muda, Paweł Leszczyński, Mattia Bertorello, Maciej Obuchowski, Harel Shein, Damien Hawes, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 10:46:42
+
+

*Thread Reply:* Thanks, all. The release is approved..

+ + + +
+ 🙌 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-14 15:26:58
+
+

*Thread Reply:* Changelog PR is here: https://github.com/OpenLineage/OpenLineage/pull/2516

+
+ + + + + + + +
+
Labels
+ documentation +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-15 11:05:02
+
+

@channel +We released OpenLineage 1.10.2, featuring:

+ +

Additions +• Dagster: add new provider for version 1.6.10 #2518 @JDarDagran +• Flink: support lineage for a hybrid source #2491 @HuangZhenQiu +• Flink: bump Flink JDBC connector version #2472 @HuangZhenQiu +• Java: add a OpenLineageClientUtils#loadOpenLineageJson(InputStream) and change OpenLineageClientUtils#loadOpenLineageYaml(InputStream) methods #2490 @d-m-h +• Java: add info from the HTTP response to the client exception #2486 @davidjgoss +• Python: add support for MSK IAM authentication with a new transport #2478 @mattiabertorello +Removal +• Airflow: remove redundant information from facets #2524 @kacpermuda +Fixes +• Airflow: proceed without rendering templates if task_instance copy fails #2492 @kacpermuda +• Flink: fix class not found issue for Cassandra #2507 @pawel-big-lebowski +• Flink: refine the JDBC table name #2512 @HuangZhenQiu +• Flink: fix JDBC dataset naming #2508 @pawel-big-lebowski +• Flink: fix failure due to missing Cassandra classes #2507 @pawel-big-lebowski +• Flink: fix release runtime dependencies #2504 @HuangZhenQiu +• Spark: fix the HttpTransport timeout #2475 @pawel-big-lebowski +• Spark: prevent NPE if the context is null #2515 @pawel-big-lebowski +• Spec: improve Cassandra lineage metadata #2479 @HuangZhenQiu +Thanks to all the contributors with a shout out to @Maciej Obuchowski for the after-hours CI fix! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.10.2 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.9.1...1.10.2 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🚀 Maciej Obuchowski, Kacper Muda, Mattia Bertorello, Paweł Leszczyński +
+ +
+ 🔥 Maciej Obuchowski, Mattia Bertorello, Paweł Leszczyński, Peter Huang +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 08:12:43
+
+

Hi I am new to Openlineage. So can someone help me to understand and how exactly it is setup and how I can setup in my personal laptop and play with it to gain hands on experience

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 08:15:17
+
+

*Thread Reply:* Hey, checkout our Getting Started guide, and the whole documentation on python, java, spark etc. where you will find all the information about the setup and configuration. For Airflow>=2.7, there is a separate documentation

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 08:52:41
+
+

*Thread Reply:* I am getting this error when i am following the commands in my windows laptop: +git clone git@github.com:MarquezProject/marquez.git && cd marquez/docker +running up.sh --seed +marquez-api | WARNING 'MARQUEZCONFIG' not set, using development configuration. +seed-marquez-with-metadata | wait-for-it.sh: waiting 15 seconds for api:5000 +marquez-web | [HPM] Proxy created: /api/v1 -> http://api:5000/ +marquez-web | App listening on port 3000! +marquez-api | INFO [2024-03-18 12:45:01,702] org.eclipse.jetty.util.log: Logging initialized @1991ms to org.eclipse.jetty.util.log.Slf4jLog +marquez-api | INFO [2024-03-18 12:45:01,795] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: / +marquez-api | INFO [2024-03-18 12:45:01,796] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: / +marquez-api | INFO [2024-03-18 12:45:01,797] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: graphql-playground for path /graphql-playground/** +marquez-api | INFO [2024-03-18 12:45:01,807] marquez.MarquezApp: Running startup actions... +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: Flyway Community Edition 8.5.13 by Redgate +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: See what's new here: https://flywaydb.org/documentation/learnmore/releaseNotes#8.5.13 +marquez-api | INFO [2024-03-18 12:45:01,842] org.flywaydb.core.internal.license.VersionPrinter: +marquez-db | 2024-03-18 12:45:02.039 GMT [34] FATAL: password authentication failed for user "marquez" +marquez-db | 2024-03-18 12:45:02.039 GMT [34] DETAIL: Role "marquez" does not exist. +marquez-db | Connection matched pghba.conf line 100: "host all all all scram-sha-256" +marquez-api | ERROR [2024-03-18 12:45:02,046] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. +marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez"

+ +

Do I have to do any additional setup to run marquez in local.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 09:02:47
+
+

*Thread Reply:* I don't think OpenLineage and Marquez support windows in any way

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:04:57
+
+

*Thread Reply:* But another way to explore OL and Marquez is with GitPod: https://github.com/MarquezProject/marquez?tab=readme-ov-file#try-it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 09:05:17
+
+

*Thread Reply:* Also, @GUNJAN YADU have you tried deleting all volumes and starting over?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:10:49
+
+

*Thread Reply:* Volumes as in?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-18 09:12:21
+
+

*Thread Reply:* Probably docker volumes, you can find them in docker dashboard app:

+ +
+ + + + + + + + + +
+ + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:13:44
+
+

*Thread Reply:* Okay +Its password authentication failure. So do I have to do any kind of posgres setup or environment variable setup

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 09:24:29
+
+

*Thread Reply:* marquez-db | 2024-03-18 13:19:37.211 GMT [36] FATAL: password authentication failed for user "marquez" +marquez-db | 2024-03-18 13:19:37.211 GMT [36] DETAIL: Role "marquez" does not exist.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-18 10:11:43
+
+

*Thread Reply:* Setup is successful

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-18 11:20:43
+
+

*Thread Reply:* @GUNJAN YADU can share what steps you took to make it work?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-19 00:14:17
+
+

*Thread Reply:* First I cleared the volumes +Then did the steps mentioned in link you shared in git bash. +It worked then

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 09:00:19
+
+

*Thread Reply:* Ah, so you used GitPod?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
GUNJAN YADU + (gunjanyadu6@gmail.com) +
+
2024-03-21 00:35:58
+
+

*Thread Reply:* No +I haven’t. I ran all the commands in git bash

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-19 08:06:07
+
+

Hi everyone !

+ +

I'm beginner to this tool.

+ +

My name is Rohan and facing challenges on Marquez. I have followed the steps as mentioned on website and facing this error. Please check attached picture.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 09:35:16
+
+

*Thread Reply:* Hi Rohan, welcome! There are a number of guides across the OpenLineage and Marquez sites. Would you please share a link to the guide you are using? Also, terminal output as well as version and system information would be helpful. The issue could be a simple config problem or more complicated, but it's impossible to say from the screenshot.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-20 01:47:22
+
+

*Thread Reply:* Hi Michael Robinson,

+ +

Thank you for reverting on this.

+ +

The link I used for installation : https://openlineage.io/getting-started/

+ +

I have attached the terminal output.

+ +

Docker version : 25.0.3, build 4debf41

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-20 01:48:55
+
+

*Thread Reply:* Continuing above thread with a screenshot :

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-20 11:25:36
+
+

*Thread Reply:* Thanks for the details, @Rohan Doijode. Unfortunately, Windows isn't currently supported. To explore OpenLineage+Marquez on Windows we recommend using this pre-configured Marquez Gitpod environment.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-21 00:49:41
+
+

*Thread Reply:* Hi @Michael Robinson,

+ +

Thank you for your input.

+ +

My issues has been resolved.

+ + + +
+ 🎉 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-19 11:37:02
+
+

Hey team! Quick check - has anyone submitted or is planning to submit a CFP for this year's Airflow Summit with an OL talk? Let me know! 🚀

+ + + +
+ ➕ Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 11:40:11
+
+

*Thread Reply:* https://sessionize.com/airflow-summit-2024/

+
+
sessionize.com
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-19 11:40:22
+
+

*Thread Reply:* the CFP is scheduled to close on April 17

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-19 11:40:59
+
+

*Thread Reply:* Yup. I was thinking about submitting one, but don't want to overlap with someone that already did 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 14:54:06
+
+

Hey Team, We are using MWAA (AWS Managed airflow) which is on version 2.7.2. So we are making use of airflow provided openlineage packages. We have simple test DAG which uses BashOperator and we would like to use manually annotated lineage. So we have provided the inlets and outlets. But when I am run the job. I see the errors - Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.bash.BashExtractor object at 0x7f9446276190> - section/key [openlineage/disabledforoperators]. Do I need to make any configuration changes?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 15:26:53
+
+

*Thread Reply:* hey, there’s a fix for that: https://github.com/apache/airflow/pull/37994 +not released yet.

+ +

Unfortunately, before the release you need to manually set missing entries in configuration

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 16:15:18
+
+

*Thread Reply:* Thanks @Jakub Dardziński . So the temporary fix is to set disabledforoperators for the unsupported operators? If I do that, Do I get my lineage emitted for bashOperator with manually annotated information?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-19 16:15:59
+
+

*Thread Reply:* I think you should set it for disabled_for_operators, config_path and transport entries (maybe you’ve set some of them already)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 16:23:25
+
+

*Thread Reply:* Ok . Thanks. Yes I did them already.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-19 22:03:04
+
+

*Thread Reply:* These are my configurations. Its emitting run event only. I have my manually annotated lineage defined for the bashoperator. So when I provide the disabledforoperators, I don't see any errors, But log clearly says "Skipping extraction for operator BashOperator". So I don't see the inlets & outlets info in marquez. If I don't provide disabledforoperators, it fails with error "Failed to extract metadata using found extractor <airflow.providers.openlineage.extractors.bash.BashExtractor object at 0x7f9446276190> - section/key [openlineage/disabledforoperators]". So i cannot go either way. Any workaround? or I am making some mistake?

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-20 02:28:53
+
+

*Thread Reply:* Hey @Anand Thamothara Dass, make sure to simply set the config_path , disabled_for_operators and transport to empty strings, unless you actually want to use it (f.e. leave transport as it is if it contains the configuration to the backend). Current issue is that when no variables are found the error is raised, no matter if the actual value is set - they simply need to be in configuration, even as empty string.

+ +

In your setup i seed that you included BashOperator in disabled, so that's why it's ignored.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-20 12:03:55
+
+

*Thread Reply:* Hmm strange. setting to empty strings worked. When I display it in console, I am able to see all the outlets information. But when I transport it to marquez endpoint, I am able to see only run events. No dataset information are captured in Marquez. But when I build the payload myself outside Airflow and push it using postman, I am able to see the dataset information as well in marquez. So I don't know where is the issue. Its airflow or openlineage or marquez 😕

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-20 12:07:07
+
+

*Thread Reply:* Could you share your dag code and task logs for that operator? I think if you use BashOperator and attach inlets and outlets to it, it should work just fine. Also please share the version of Ol package you are using and the name

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anand Thamothara Dass + (anand_thamotharadass@cable.comcast.com) +
+
2024-03-20 14:57:40
+
+

*Thread Reply:* @Kacper Muda - Got that fixed. {"type": "http","url":"<http://10.80.35.62:3000%7Chttp://<ip>:3000>%22,%22endpoint%22:%22api/v1/lineage%22}. Got the end point removed. {"type": "http","url":"<http://10.80.35.62:3000%7Chttp://<ip>:3000>%22}. Kept only till here. It worked. Didn't think that, v1/lineage forces only run events capture. Thanks for all the support !!!

+ + + +
+ 👍 Jakub Dardziński, Kacper Muda +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rohan Doijode + (doijoderohan882@gmail.com) +
+
2024-03-21 07:44:29
+
+

Hi all,

+ +

We are planning to use OL as Data Lineage Tool.

+ +

We have data in S3 and do use AWS Kinesis. We are looking forward for guidelines to generate graphical representation over Marquez or any other compatible tool.

+ +

This includes lineage on column level and metadata during ETL.

+ +

Thank you in advance

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:06:06
+
+

Hello all, we are struggling with a spark integration with AWS Glue. We have gotten to a configuration that is not causing errors in spark, but it’s not producing any output in the S3 bucket. Can anyone help figure out what’s wrong? (code in thread)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:06:35
+
+

*Thread Reply:* ```import sys +from awsglue.transforms import ** +from awsglue.utils import getResolvedOptions +from pyspark.context import SparkContext +from awsglue.context import GlueContext +from awsglue.job import Job +from pyspark.context import SparkConf +from pyspark.sql import SparkSession

+ +

args = getResolvedOptions(sys.argv, ["JOBNAME"]) +print(f'the job name received is : {args["JOBNAME"]}')

+ +

spark1 = SparkSession.builder.appName("OpenLineageExample").config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener").config("spark.openlineage.transport.type", "file").config("spark.openlineage.transport.location", "").config("spark.openlineage.namespace", "AWSGlue").getOrCreate()

+ +

glueContext = GlueContext(sc)

+ +

Initialize the glue context

+ +

sc = SparkContext(spark1)

+ +

glueContext = GlueContext(spark1) +spark = glueContext.spark_session

+ +

job = Job(glueContext) +job.init(args["JOB_NAME"], args)

+ +

df=spark.read.format("csv").option("header","true").load("s3://<bucket>/input/Master_Extract/") +df.write.format('csv').option('header','true').save(' + + + +

+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 09:07:05
+
+

*Thread Reply:* cc @Rodrigo Maia since I know you’ve done some AWS glue

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 11:41:39
+
+

*Thread Reply:* Several things:

+ +
  1. s3 isn't a file system. It is an object storage system. Concretely, this means when an object is written, it's immutable. If you want to update the object, you need to read it in its entirety, modify it, and then write it back.
  2. Java probably doesn't know how to handle the s3 protocol.
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 11:41:54
+
+

*Thread Reply:* (As opposed the the file protocol)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:05:15
+
+

*Thread Reply:* OK, so the problem is we’ve set it to config(“spark.openlineage.transport.type”, “file”) +and then give it s3:// instead of a file path…..

+ +

But it’s AWS Glue so we don’t have a local filesystem to save it to.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:05:55
+
+

*Thread Reply:* (I also hear you that S3 isn’t an ideal place for concatenating to a logfile because you can’t concatenate)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 12:20:46
+
+

*Thread Reply:* Unfortunately, I have zero experience with Glue.

+ +

Several approaches:

+ +
  1. Emit to Kafka (you can use MSK)
  2. Emit to Kinesis
  3. Emit to Console (perhaps a centralised logging tool, like Cloudwatch will pick it up)
  4. Emit to a local file, but I have no idea how you retrieve that file.
  5. Emit to an HTTP endpoint
  6. +
+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:22:25
+
+

*Thread Reply:* I appreciate some ideas for next steps

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-21 12:22:30
+
+

*Thread Reply:* Thank you

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-21 12:25:30
+
+

*Thread Reply:* did you try transport console to check if the OL setup is working? regardless of i/o, it should put something in the logs with an event.

+ + + +
+ 👀 Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-21 12:36:41
+
+

*Thread Reply:* Assuming the log4j[2].properties file is configured to allow the io.openlineage package to log at the appropriate level.

+ + + +
+ 👀 Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-22 07:01:47
+
+

*Thread Reply:* @Sheeri Cabral (Collibra), did you try to use a different transport type, as suggested by @Damien Hawes in https://openlineage.slack.com/archives/C01CK9T7HKR/p1711038046057459?thread_ts=1711026366.869199&cid=C01CK9T7HKR? And described in the docs: +https://openlineage.io/docs/integrations/spark/configuration/transport#file

+ +

Or would you like for the OL spark driver to support an additional transport type (e.g. s3) to emit OpenLineage events?

+
+ + +
+ + + } + + Damien Hawes + (https://openlineage.slack.com/team/U05FLJE4GDU) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-03-22 09:40:39
+
+

*Thread Reply:* I will try different transport types, haven’t gotten a chance to yet.

+ + + +
+ 🙌 tati +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
tati + (tatiana.alchueyr@astronomer.io) +
+
2024-03-25 07:05:17
+
+

*Thread Reply:* Thanks, @Sheeri Cabral (Collibra); please let us know how it goes!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 05:06:26
+
+

*Thread Reply:* @Sheeri Cabral (Collibra) did you tried on the other transport types by any chance?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:32:20
+
+

*Thread Reply:* Sorry, with the holiday long weekend in Europe things are a bit slow. We did, and I just put a message in the #general chat https://openlineage.slack.com/archives/C01CK9T7HKR/p1712147347085319 as we are getting some errors with the spark integration.

+
+ + +
+ + + } + + Sheeri Cabral + (https://openlineage.slack.com/team/U0323HG8C8H) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-03-22 14:45:12
+
+

I've been testing around with different Spark versions. Does anyone know if OpenLineage works with spark 2.4.4 (scala 2.12.10)? Ive getting a lot of errors, but ive only tried versions 1.8+

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 16:36:32
+
+

*Thread Reply:* Hi @Rodrigo Maia, OpenLineage does not officially support Spark 2.4.4. The earliest version supported is 2.4.6. See this doc for more information about the supported versions of Spark, Airflow, Dagster, dbt, and Flink.

+ + + +
+ 👍 Rodrigo Maia +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-23 04:15:27
+
+

*Thread Reply:* OpenLineage CI runs against 2.4.6 and it is passing. I wouldn't expect any breaking differences between 2.4.4 and 2.4.6, but please let us know if this is the case.

+ + + +
+ 👍 Rodrigo Maia +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-22 15:18:52
+
+

@channel +Thanks to everyone who attended our first Boston meetup, co-sponsored by Astronomer and Collibra and featuring presentations by partners at Collibra, Astronomer and DataDog, this past Tuesday at Microsoft New England. Shout out to @Sheeri Cabral (Collibra), @Jonathan Morin, and @Paweł Leszczyński for presenting and to Sheeri for co-hosting! Topics included: +• "2023 in OpenLineage," a big year that saw: + ◦ 5 new integrations, + ◦ the Airflow Provider launch, + ◦ the addition of static/"design-time" lineage in 1.0.0, + ◦ the addition of column lineage from SQL statements via the SQL parser, + ◦ and 22 releases. +• A demo of Marquez, which now supports column-level lineage in a revamped UI +• Discussion of "Why Do People Use Lineage?" by Sheeri at Collibra, covering: + ◦ differences between design and operational lineage, + ◦ use cases served such as compliance, traceability/provenance, impact analysis, migration validation, and quicker onboarding, + ◦ features of Collibra's lineage +• A demo of streaming support in the Apache Flink integration by Paweł at Astronomer, illustrating lineage from: + ◦ a Flink job reading from a Kafka topic to Postgres, + ◦ a few SQL jobs running queries in Postgres, + ◦ a Flink job taking a Postgres table and publishing it back to Kafka +• A demo of an OpenLineage integration POC at DataDog by Jonathan, covering: + ◦ Use cases served by DataDog's Data Streams Monitoring service + ◦ OpenLineage's potential role providing and standardizing cross-platform lineage for DataDog's monitoring platform. +Thanks to Microsoft for providing the space. +If you're interested in attending, presenting at, or hosting a future meetup, please reach out.

+ +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+ 🙌 Jonathan Morin, Harel Shein, Rodrigo Maia, Maciej Obuchowski +
+ +
+ :datadog: Harel Shein, Paweł Leszczyński, Rodrigo Maia, Maciej Obuchowski, Jean-Mathieu Saponaro +
+ +
+ 👏 Peter Huang, Rodrigo Maia, tati +
+ +
+ 🎉 tati +
+ +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:08:21
+
+

*Thread Reply:* Hey @Michael Robinson, was the meetup recorded?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-25 09:26:04
+
+

*Thread Reply:* @Maciej Obuchowski yes, and a clip is on YouTube. Hoping to have @Jonathan Morin’s clip posted soon, as well

+
+
YouTube
+ +
+ + + } + + OpenLineage Project + (https://www.youtube.com/@openlineageproject6897) +
+ + + + + + + + + + + + + + + + + +
+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 19:57:48
+
+

Airflow 2.8.3 Python 3.11 +Trying to do a hello world lineage example using this simple bash operator DAG — but I don’t have anything emitting to my marquez backend. +I’m running airflow locally following docker-compose setup here. +More details in thread:

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 19:59:45
+
+

*Thread Reply:* Here is my airflow.cfg under +```[webserver] +expose_config = 'True'

+ +

[openlineage] +configpath = '' +transport = '{"type": "http", "url": "http://localhost:5002", "endpoint": "api/v1/lineage"}' +disabledfor_operators = ''```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 20:01:15
+
+

*Thread Reply:* I can curl my marquez backend just fine — but yeah not seeing anything emitted by airflow

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-22 20:19:44
+
+

*Thread Reply:* Have I missed something in the set-up? Is there a way I can validate the config was ingested correctly?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-23 03:42:40
+
+

*Thread Reply:* Can you see any logs related to OL in Airflow? Is Marquez in the same docker compose? Maybe try changing to host.docker.internal from localhost

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 00:51:31
+
+

*Thread Reply:* So I figured it out. For reference the issue was that ./config wasn’t for airflow.cfg as I had blindly interpreted it to be. Instead, setting the open lineage values as environment variables worked.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 01:01:47
+
+

*Thread Reply:* Otherwise for the simple DAG with just BashOperators, I was expecting to see a similar “lineage” DAG in marquez, but I only see individual jobs. Is that expected?

+ +

Formulating my question differently, does the open lineage data model assume a bipartite type graph, of Job -> Dataset -> Job -> Dataset etc always? Seems like there would be cases where you could have Job -> Job where there is no explicit “data artifact produced”?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-24 02:13:30
+
+

*Thread Reply:* Another question — is there going to be integration with the “datasets” & inlets/outlets concept airflow now has? +E.g. I would expect the OL integration to capture this:

+ +

```# [START datasetdef] +dag1dataset = Dataset("", extra={"hi": "bye"})

+ +

[END dataset_def]

+ +

with DAG( + dagid="datasetproduces1", + catchup=False, + startdate=pendulum.datetime(2021, 1, 1, tz="UTC"), + schedule="@daily", + tags=["produces", "dataset-scheduled"], +) as dag1: + # [START taskoutlet] + BashOperator(outlets=[dag1dataset], taskid="producingtask1", bashcommand="sleep 5") + # [END task_outlet]``` +i.e. the outlets part. Currently it doesn’t seem to.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-25 03:47:29
+
+

*Thread Reply:* OL only converts File and Table entities so far from manual inlets and outlets

+ + + +
+ 👍 Stefan Krawczyk +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-25 05:00:22
+
+

*Thread Reply:* on the Job -> Dataset -> Job -> Dataset: OL and Marquez do not aim into reflecting Airflow DAGs. They rather focus on exposing metadata that is collected around data processing

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Stefan Krawczyk + (stefan@dagworks.io) +
+
2024-03-25 14:27:42
+
+

*Thread Reply:* > on the Job -> Dataset -> Job -> Dataset: OL and Marquez do not aim into reflecting Airflow DAGs. They rather focus on exposing metadata that is collected around data processing +That makes sense. I’m was just thinking through the implications and boundaries of what “lineage” is modeled. Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 06:18:05
+
+

Hi Team... We have a use case where we want to know when a column of the table gets updated in BIGQUERY and we have some questions related to it.

+ +
  1. In some of the openlineage events that are generated, outputs.facets.columnLineage is null. Can we assume all the columns get updated when this is the case?
  2. Also outputs.facets.schema seems to be null in some of the events generated. How do we get the schema of the table in this case?
  3. output.namespace is also null in some cases. How do we determine output datasource in this case?
  4. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:07:02
+
+

*Thread Reply:* For BigQuery, we use BigQuery API to get the lineage that unfortunately does not present us with column-level lineage. Adding that would be a feature.

+ +

For 2. and 3. it might happen that the result you're reading is from query cache, as this was earlier executed and not changed - in that case we won't have full information yet. https://cloud.google.com/bigquery/docs/cached-results

+
+
Google Cloud
+ + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 07:45:04
+
+

*Thread Reply:* So, can we assume that if the query is not a duplicate one, fields outputs.facets.schema and output.namespace will not be empty? +And ignore the COMPLETE events when those fields are empty as they are not providing any new updates?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-25 07:59:55
+
+

*Thread Reply:* > So, can we assume that if the query is not a duplicate one, fields outputs.facets.schema and output.namespace will not be empty? +Yes, I would assume so. +> And ignore the COMPLETE events when those fields are empty as they are not providing any new updates? +That probably depends on your use case, different jobs can access same tables/do same queries in that case.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Suhas Shenoy + (ksuhasshenoy@gmail.com) +
+
2024-03-25 23:49:46
+
+

*Thread Reply:* Okay. We wanted to know how can we determine the output datasource from the events?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-26 01:51:15
+
+

Hi Team, +Currently OpenLineage Marquez use postgres db to store the meta data. Instead postgres, we want to store them on Snowflake DB. Do we have kind if inbuilt configuration in the marquez application to change the marquez database to Snowflake? If not, what will be the approach?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 04:50:25
+
+

*Thread Reply:* The last time I looked at Marquez (July last year), Marquez was highly coupled to PostgreSQL specific functionality. It had code, particularly for the graph traversal, written in PostgreSQL's PL/pgSQL. Furthermore, it uses PostgreSQL as an OLTP database. My limited knowledge of Snowflake says that it is an OLAP database, this means that it would be a very poor fit for the application. For any migration to another database engine, it would be a large undertaking.

+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-26 05:13:25
+
+

*Thread Reply:* Hi @Ruchira Prasad, this is not possible at the moment. Marquez splits OL events into neat relational model to allow efficient lineage queries. I don't think this would be achievable in Snowflake.

+ +

As an alternative approach, you can try fluentd proxy -> https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd +Fluentd provides bunch of useful output plugins that let you send logs into several warehouses (https://www.fluentd.org/plugins), however I cannot find snowflake on the list.

+ +

On the snowflake side, there is quickstart on how to ingest fluentd logs into it -> https://quickstarts.snowflake.com/guide/integrating_fluentd_with_snowflake/index.html#0

+ +

To wrap up: if you need lineage events in Snowflake, you can consider sending events to a FluentD endpoint and then load them to Snowflake. In contrast to Marquez, you will query raw events which may be cumbersome in some cases like getting several OL events that describe a single run.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-26 05:56:39
+
+

*Thread Reply:* Note that supporting (not even migrating) a backend application that can use multiple database engines comes at a huge opportunity cost, and it's not like Marquez has more contributors than it needs 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ruchira Prasad + (ruchiraprasad@gmail.com) +
+
2024-03-26 06:28:47
+
+

*Thread Reply:* Since both Postgres and Snowflake supports JDBC, can't we point to Snowflake with changing following?

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 06:29:16
+
+

*Thread Reply:* It doesn't have anything to do with the driver. JDBC is the driver, it defines the protocol that that communication link must abide by.

+ +

Just like how ODBC is a driver, and in the .NET world, how OLE DB is a driver.

+ +

It tells us nothing about the capabilities of the database. In this case, using PostgreSQL was chosen because of its capabilities, and because of those capabilities, the application code leverages more of those capabilities than just a generic read / write database. Moving all that logic from PostgreSQL PL/pgSQL to the application would (1) take a significant investment in time; (2) present bugs; (3) slow down the application response time, because you have to make many more round-trips to the database, instead of keeping the code close to the data.

+ + + +
+ ☝️ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-26 06:39:57
+
+

*Thread Reply:* If you're still curious, and want to test things out for yourself:

+ +
  1. Create a graph structure on a SQL database (edge table, vertex table, relationship table)
  2. Write SQL to perform that traversal
  3. Write Java application code that reads from the database, then tries to perform traversals by again reading data from the database. +Measure the performance impact, and you will see that (2) is far quicker than (3). This is one of the reasons why Marquez uses PostgreSQL and leverages its PL/pgSQL capabilities, because otherwise the application would be significantly for any traversal that is more than a few levels deep.
  4. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:57:50
+
+

Hi Team,

+ +

Looking for feedback on the below Problem and Proposal.

+ +

We are using OpenLineage with our AWS EMR clusters to extract lineage and send it to a backend Marquez deployment (also in AWS). This is working fine and we are getting table and column level lineage.

+ +

Problem: Is we are seeing: +• 15+ OpenLineage events with multiple jobs being shows in Marquez for a single Spark job in EMR. This causes confusion because team members using Marquez are unsure which "job" in Marquez to look at. +• The S3 locations are being populated in the namespace. We wanted to use namespace for teams. However, having S3 locations in the namespace in a way "pollutes" the list. +I understand the above are not issues/bugs. However, our users want us to "clean" up the Marquez UI.

+ +

Proposal: One idea was to have a Lambda intercept the 10-20 raw OpenLineage events from EMR and then process -> condense them down to 1 event with the job, run, inputs, outputs. And secondly, to swap out the namespace from S3 to actual team names via a lookup we would host ourselves.

+ +

While the above proposal technically could work we wanted to check with the team here if it makes sense, any caveats, alternatives others have used. Ideally, we don't want to own parsing OpenLineage events if there is an existing solution.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:58:15
+
+

*Thread Reply:* Screenshot: 1 spark job = multiple "jobs" in Marquez

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-26 15:58:35
+
+

*Thread Reply:* Screenshot: S3 locations in namespace.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-26 16:59:48
+
+

*Thread Reply:* Hi @Bipan Sihra, thanks for posting this -- it's exciting to hear about your use case at Amazon! I wonder if you wouldn't mind opening a GitHub issue so we can track progress on this and make sure you get answers to your questions.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-26 17:23:19
+
+

*Thread Reply:* Also, would you please share the version of openlineage-spark you are on?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-27 09:05:09
+
+

*Thread Reply:* Hi @Michael Robinson. Sure, I can open a Github issue. +Also, we are currently using io.openlineage:openlineage_spark_2.12:1.9.1.

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tristan GUEZENNEC -CROIX- + (tristan.guezennec@decathlon.com) +
+
2024-03-28 09:51:12
+
+

*Thread Reply:* @Yannick Libert

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bipan Sihra + (bsihra@amazon.com) +
+
2024-03-28 09:52:43
+
+

*Thread Reply:* I was able to find info I needed here: https://github.com/OpenLineage/OpenLineage/discussions/597

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-03-27 07:55:37
+
+

Hi Team, we are trying to collect lineage for a Spark job using OpenLineage(v1.8.0) and Marquez (v0.46). We can see the "Schema" details for all "Datasets" created but we can't see "Column-level" lineage and getting "Column lineage not available for the specified dataset" on Marquez UI under "COLUMN LINEAGE" tab.

+ +

About Spark Job: The job reads data from few oracle tables using JDBC connections as Temp views in Spark, performs some transformations (joining & aggregations) over different steps, creating intermediate temp views and finally writing the data to HDFS location. So, it looks something like this:

+ +

Read oracle tables as temp views -&gt; transformations set1 --&gt; creation of few more temp views from previously created temp views --&gt; transformations set2, set3 ... --&gt; Finally writing to hdfs(when all the temp view gets materialised in-memory to create final output dataset). +We are getting the schema details for finally written dataset but no column-level lineage for the same. Also, while checking the json lineage data, I can see "" (blank) for "inputs" key (just before "outputs" key which contains dataset name & other details in nested key-value form). As per my understanding, this explains null value for "columnLineage" key hence no column-level lineage but unable to understand why!

+ +

Appreciate if you could share some thoughts/idea in terms of what is going wrong here as we are stuck on this point? Also, not sure we can get the column-level lineage only for datasets created from permanent Hive tables and not for temp/un-materialised views using OpenLineage & Marquez.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-27 08:54:38
+
+

*Thread Reply:* My first guess would be that either some of the interaction between JDBC/views/materialization make the CLL not show, or possibly transformations - if you're doing stuff like UDFs we lose the column-level info, but it's hard to confirm without seeing events and/or some minimal reproduction

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-03-29 08:48:03
+
+

*Thread Reply:* Hi @Maciej Obuchowski, Thanks for responding on this. +We are using SparkSQL where we are reading the data from Oracle tables as temptable then running sql like queries (for transformation) on previously created temptable. +Now, let say we want to run a set of transformations, so we have written the transformation logic as sql like queryies. So, when this first query (query1) would get executed resulting in creation of temptable1, then query2 will get executed on temptable1 creating temptable2 and so on. For such use case, we have developed a custom function, this custom function will take these queries (query1, query2, ...) as input and will run iteratively and will create temptable1, temptable2,... and so on. This custom function uses RDD APIs and in-built functions like collect() along with few other scala functions. So, not sure whether usage of RDD will break the lineage or what's going wrong. +Lastly, we do have jobs where we are using direct UDFs in spark but we aren't getting CLL for those jobs also which doesn't have UDF usage. +Hope this gives some context on how we are running the job.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Ranvir Singh + (ranvir.tune@gmail.com) +
+
2024-04-04 13:08:32
+
+

*Thread Reply:* Hey @Maciej Obuchowski, appreciate your help/comments on this.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
George Tong + (george@terradot.earth) +
+
2024-03-27 14:53:44
+
+

Hey everyone 👋

+ +

I’m working at a carbon capture 🌍 company and we’re designing how we want to store data in our PostgreSQL database at the moment. One of the key things we’re focusing on is traceability and transparency of data, as well as ability to edit and maintain historical data. This is key as if we make an error and need to update a previous data point, we want to know everything downstream of that data point that needs to be rerun and recalculated. You might be able to guess where this is going… +• Any advice on how we should be designing our table schemas to support editing and traceability? We’re currently looking using temporal tables +• Is Open Lineage the right tool for downstream tracking and traceability? Are there any other tools we should be looking at instead? +I’m new here so hopefully I asked in the right channel. Let me know if I should be asking elsewhere!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-03-28 05:55:48
+
+

*Thread Reply:* Hey, In my opinion, OpenLineage is the right tool for what you are describing. Together with some backend like Marquez it will allow you to visualize data flow, dependencies (upstreams, downstreams) and more 🙂

+ + + +
+ 🙌 George Tong +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-28 15:54:58
+
+

*Thread Reply:* Hi George, welcome! To add to what Kacper said, I think it also depends on what you are looking for in terms of "transparency." I guess I'm wondering exactly what you mean by this. A consumer using the OpenLineage standard (like Marquez, which we recommend in general but especially for getting started) will collect metadata about your pipelines' datasets and jobs but won't collect the data itself or support editing of your data. You're probably fully aware of this, but it's a point of confusion sometimes, and since you mentioned transparency and updating data I wanted to emphasize this. I hope this helps!

+ + + +
+ 🙌 George Tong +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
George Tong + (george@terradot.earth) +
+
2024-03-28 19:28:36
+
+

*Thread Reply:* Thanks for the thoughts folks! Yes I think my thoughts are starting to become more concrete - retaining a history of data and ensuring that you can always go back to a certain time of your data is different from understanding the downstream impact of a data change, (which is what OpenLineage seems to tackle)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-03-28 03:18:42
+
+

Hi team, so we're using OL v 1.3.1 on databricks, on a non termination cluster. We're seeing that the heap memory is increasing very significantly, and notice that the majority of the memory comes from OL. Any idea if we're having some memory leaks from OL? Have we seen any similar issues being reported before? Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-28 10:43:36
+
+

*Thread Reply:* First idea would be to bump version 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-03-28 10:56:55
+
+

*Thread Reply:* Does it affect all the jobs or just some of them? Does it somehow correlate with amount of spark tasks a job is processing? Would you be able to test the behaviour on the jar prepared from the branch? Any other details helping to reproduce this would be nice.

+ +

So many questions for the start... Happy to see you again @Anirudh Shrinivason. Can't wait looking into this next week.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-03-28 11:12:23
+
+

*Thread Reply:* FYI - this is my experience as discussed on Tuesday @Paweł Leszczyński @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-01 05:31:09
+
+

*Thread Reply:* Hey @Maciej Obuchowski @Paweł Leszczyński Thanks for the questions! Here are some details and clarifications I have:

+ +
  1. First idea would be to bump version Has such an issue been fixed in the later versions? So this is an already known issue with 1.3.1 version? Just curious why bumping it might resolve the issue...
  2. Does it affect all the jobs or just some of them So far, we're monitoring the heap at a cluster level... It's a shared non-termination cluster. I'll try to take a look at a job level to get some more insights.
  3. Does it somehow correlate with amount of spark tasks a job is processing This was my initial thought too, but from looking at a few of the pipelines, they seem relatively straightforward logic wise. And I don't think it's because a lot of tasks are running in parallel causing the amount of allocated objects to be very high... (Let me check back on this)
  4. Any other details helping to reproduce this would be nice. Yes! Let me try to dig a little more, and try to get back with more details...
  5. FYI - this is my experience as discussed on Tuesday Hi @Damien Hawes may I check if there is anywhere I could get some more information on your observations? Since it seems related, maybe they're the same issues? +But all in all, I ran a high level memory analyzer, and it seemed to look like a memory leak from the OL jar... We noticed the heap size from OL almost monotonically increasing to >600mb... +I'll try to check and do a bit more analysis before getting back with more details. :gratitudethankyou:
  6. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-02 00:52:32
+
+

*Thread Reply:* This is what the heap dump looks like after 45 mins btw... ~11gb from openlineage out of 14gb heap

+ + + + +
+ ❤️ Paweł Leszczyński, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 03:34:50
+
+

*Thread Reply:* Nice. That's slightly different to my experience. We're running a streaming pipeline on a conventional Spark cluster (not databricks).

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 04:56:13
+
+

*Thread Reply:* OK. I've found the bug. I will create an issue for it.

+ +

cc @Maciej Obuchowski @Paweł Leszczyński

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 04:59:49
+
+

*Thread Reply:* Great. I am also looking into unknown facet. I think this could be something like this -> https://github.com/OpenLineage/OpenLineage/pull/2557/files

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:00:25
+
+

*Thread Reply:* Not quite.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:00
+
+

*Thread Reply:* The problem is that the UnknownEntryFacetListener accumulates state, even if the spark_unknown facet is disabled.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:41
+
+

*Thread Reply:* The problem is that the code eagerly calls UnknownEntryFacetListener#apply

+ + + +
+ 🙌 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:01:54
+
+

*Thread Reply:* Without checking if the facet is disabled or not.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:02:17
+
+

*Thread Reply:* It only checks whether the facet is disabled or not, when it needs to add the details to the event.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 05:03:40
+
+

*Thread Reply:* Furthermore, even if the facet is enabled, it never clears its state.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 05:04:31
+
+

*Thread Reply:* yes, and if logical plan is spark.createDataFrame with local data, this can get huge

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:01:10
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2561

+ + + +
+ 👍 Paweł Leszczyński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Anirudh Shrinivason + (anirudh.shrinivason@grabtaxi.com) +
+
2024-04-03 06:20:51
+
+

*Thread Reply:* 🙇

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-28 21:50:01
+
+

Hello All - I've begun my OL journey rather recently and am running into trouble getting lineage going in an airflow job. I spun up a quick flask server to accept and print the OL requests. It appears that there are no Inputs or Outputs. Is that something I have to set in my DAG? Reference code and responses are attached.

+ +
+ + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 03:38:18
+
+

*Thread Reply:* hook-level lineage is not yet supported, you should you SnowflakeOperator instead

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 08:53:29
+
+

*Thread Reply:* Thanks @Jakub Dardziński! I used the hook because it looks like that is the supported operator based on airflow docs

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 09:20:10
+
+

*Thread Reply:* you can see this is under SQLExecuteQueryOperator +without going into the details part of the implentation is on hooks side there, not the operator

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Vinnakota Priyatam Sai + (vinnakota.priyatam@walmart.com) +
+
2024-03-29 00:14:17
+
+

Hi team, we are collecting OpenLineage events across different jobs where the output datasources are BQ, Cassandra and Postgres. We are mostly interested in the freshness of columns across these different datasources. Using OpenLineage COMPLETE event's dataset.datasource and dataset.schema we want to understand which columns are updated at what time.

+ +

We have a few questions related to BQ (as output dataset) events:

+ +
  1. How to identify if the output datasource is BQ, Cassandra or Postgres?
  2. Can we rely on dataset.datasource and dataset.schema for BQ table name and column names?
  3. Even if one column is updated, do we get all the column details in dataset.schema?
  4. If dataset.datasource or dataset.schema value is null, can we assume that no column has been updated in that event?
  5. Are there any sample BQ events that we can refer to understand the events?
  6. Is it possible to get columnLineage details for BQ as output datasource?
  7. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:11:28
+
+

*Thread Reply:* > 1. How to identify if the output datasource is BQ, Cassandra or Postgres? +The dataset namespace would contain that information: for example, the namespace for BQ would be simple bigquery and for Postgres it would be postgres://{host}:{port}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:15:06
+
+

*Thread Reply:* > 1. Can we rely on dataset.datasource and dataset.schema for BQ table name and column names? +> 2. Even if one column is updated, do we get all the column details in dataset.schema? +> 3. If dataset.datasource or dataset.schema value is null, can we assume that no column has been updated in that event? +If talking about BigQuery Airflow operators, the known issue is BigQuery query caching. You're guaranteed to get this information if the query is running for the first time, but if the query is just reading from the cache instead of being executed, we don't get that information. That would result in a run without actual input dataset data.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-03-29 10:15:56
+
+

*Thread Reply:* > 1. Is it possible to get columnLineage details for BQ as output datasource? +BigQuery API does not give us this information yet - we could augment the API data with SQL parser one though. It's a feature that don't exist yet though

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Vinnakota Priyatam Sai + (vinnakota.priyatam@walmart.com) +
+
2024-03-29 10:18:32
+
+

*Thread Reply:* This is very helpful, thanks a lot @Maciej Obuchowski

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark Dunphy + (markd@spotify.com) +
+
2024-03-29 11:54:02
+
+

Hi all, we are trying to use dbt-ol to capture lineage. We use dbt custom aliases based on the --target flag passed in to dbt-ol run. So for example if using --target dev the model alias might be some_prefix__model_a whereas with --target prod the model alias might be model_a without any prefix. OpenLineage doesn't seem to pick up on this custom alias and sends model_a regardless in the input/output. Is this intended? I'm relatively new to this data world so it is possible I'm missing something basic here.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-03-29 15:52:17
+
+

*Thread Reply:* Welcome and thanks for using OpenLineage! Someone with dbt expertise will reply soon.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:21:35
+
+

*Thread Reply:* looks like it’s another entry in manifest.json : https://schemas.getdbt.com/dbt/manifest/v10.json

+ +

called alias that is not taken into consideration

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:22:24
+
+

*Thread Reply:* it needs more analysis whether and how this entry is set

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 18:30:06
+
+

*Thread Reply:* btw how do you create alias per target? I did this:

+ +
-- Use the `ref` function to select from other models
+{% if target.name != 'prod' %}
+{{ config(materialized='incremental',unique_key='id',
+        on_schema_change='sync_all_columns', alias='third_model_dev'
+) }}
+{% else %}
+{{ config(materialized='incremental',unique_key='id',
+        on_schema_change='sync_all_columns', alias='third_model_prod'
+) }}
+{% endif %}
+
+select x.id, lower(y.name)
+from {{ ref('my_first_dbt_model') }} as x
+left join {{ ref('my_second_dbt_model' )}} as y
+ON x.id = y.i
+
+ +

but I’m curious if that’s correct scenario to test

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark Dunphy + (markd@spotify.com) +
+
2024-04-01 09:31:26
+
+

*Thread Reply:* thanks for looking into this @Jakub Dardziński! we are using the generatealiasname macro to control this. our macro looks very similar to this example

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 12:37:48
+
+

Is it possible to configure OL to only send OL Events for certain dags in airflow?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 14:22:30
+
+

*Thread Reply:* it will be possible once latest version of OL provider is released with this PR: +https://github.com/apache/airflow/pull/37725

+
+ + + + + + + +
+
Labels
+ area:providers, area:dev-tools, kind:documentation, provider:openlineage +
+ + + + + + + + + + +
+ + + +
+ ✅ Tom Linton +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 16:09:16
+
+

*Thread Reply:* Thanks!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tom Linton + (tom.linton@atlan.com) +
+
2024-03-29 13:10:52
+
+

Is it common to see this error?

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-03-29 17:32:07
+
+

*Thread Reply:* seems like trim in select statements causes issues

+ + + +
+ ✅ Tom Linton +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 10:04:45
+
+

@channel +I'd like to open a vote to release OpenLineage 1.11.0, including: +• Spark: lineage metadata extraction built-in to Spark extensions +• Spark: change SparkPropertyFacetBuilder to support recording Spark runtime config +• Java client: add metrics-gathering mechanism +• Flink: support Flink 1.19.0 +• SQL: show error message when OpenLineageSql cannot find native library +Three +1s from committers will authorize. Thanks!

+ + + +
+ ➕ Harel Shein, Rodrigo Maia, Jakub Dardziński, alexandre bergere, Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 09:44:38
+
+

*Thread Reply:* Thanks, all. The release is authorized and will be performed within 2 business days excluding tomorrow.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-01 16:13:24
+
+

@channel +The latest issue of OpenLineage News is available now, featuring a rundown of upcoming and recent events, recent releases, updates to the Airflow Provider, open proposals, and more. +To get the newsletter directly in your inbox each month, sign up here. +openlineage.us14.list-manage.com

+
+
openlineage.us14.list-manage.com
+ + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 06:01:39
+
+

Hi All, We are trying transform entities according to medallian model, where each entity goes through multiple layers of data transformation and the workflow is like the data is picked from kafka channel and stored into parquet and then trasforming it to hudi tables in silver layer. so now we are trying to capture lineage data, so far we have tried with transport type console but we are not seeing the lineage data in console (we are running this job from aws glue). below are the configuration which we have added. +spark = (SparkSession.builder + .appName('samplelineage') + .config('spark.jars.packages', 'io.openlineage:openlineagespark:1.8.0') + .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') + .config('spark.openlineage.namespace', 'LineagePortTest') + .config('spark.openlineage.parentJobNamespace', 'LineageJobNameSpace') + .config("spark.openlineage.transport.type", "console") + .config('spark.openlineage.parentJobName', 'LineageJobName') + .getOrCreate())

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 07:24:13
+
+

*Thread Reply:* Does Spark tell your during startup that it is adding the listener?

+ +

The log line should be something like "Adding io.openlineage.spark.agent.OpenLineageSparkListener"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-02 07:24:58
+
+

*Thread Reply:* Additionally, ensure your log4j.properties / log4j2.properties (depending on the version of Spark that you are using) allows io.openlineage at info level

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:04:16
+
+

*Thread Reply:* I think, as usual, hudi is the problem 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:04:35
+
+

*Thread Reply:* or are you just not seeing any OL logs/events?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 08:05:31
+
+

*Thread Reply:* as @Damien Hawes said, you should see Spark log +org.apache.spark.SparkContext - Registered listener io.openlineage.spark.agent.OpenLineageSparkListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 09:24:00
+
+

*Thread Reply:* yes I could see the mentioned logs in the console while job runs

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-02 09:30:17
+
+

*Thread Reply:* Also we are not seeing OL events

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 08:32:49
+
+

*Thread Reply:* do you see any errors or other logs that could be relevant to OpenLineage? +also, some simple reproduction might help

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Pooja K M + (pooja.km@philips.com) +
+
2024-04-03 09:06:18
+
+

*Thread Reply:* ya we could see below logs INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:04:07
+
+

Hi All! Im trying to set up OpenLineage with Managed Flink at AWS. but im getting this error:

+ +
`"throwableInformation": "io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat io.openlineage.client.transports.HttpTransport.throwOnHttpError(HttpTransport.java:151)\n\tat`
+
+ +

This is what i see in marquez. where is flink is trying to send the open lineage events

+ +

items +"message":string"The Job Result cannot be fetch..." +"_producer":string"<https://github.com/OpenLineage>..." +"_schemaURL":string"<https://openlineage.io/spec/fa>..." +"stackTrace":string"org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission. at org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92) at

+ +

Im passing the conf like this:

+ +

Properties props = new Properties(); +props.put("openlineage.transport.type","http"); +props.put("openlineage.transport.url","http://<marquez-ip>:5000/api/v1/lineage"); +props.put("execution.attached","true"); +Configuration conf = ConfigurationUtils.createConfiguration(props); +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:26:12
+
+

*Thread Reply:* Hey @Francisco Morillo, which version of Marquez are you running? Streaming support was a relatively recent addition to Marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:29:32
+
+

*Thread Reply:* So i was able to set it up working locally. Having Flink integrated with open lineage

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:29:43
+
+

*Thread Reply:* But once i deployed marquez in an ec2 using docker

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:30:16
+
+

*Thread Reply:* and have managed flink trying to emit events to openlineage i just receive the flink job event, but not the kafka source / iceberg sink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:32:31
+
+

*Thread Reply:* I ran this: +$ git clone git@github.com:MarquezProject/marquez.git &amp;&amp; cd marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:50:41
+
+

*Thread Reply:* hmmm. I see. you're probably running the latest version of marquez then, should be ok. +did you try the console transport first to see how the events look like?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:51:10
+
+

*Thread Reply:* kafka source and iceberg sink should be well supported for flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 14:54:31
+
+

*Thread Reply:* i believe there is an issue with how the conf is passed to flink job in managed flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 14:55:37
+
+

*Thread Reply:* ah, that may be the case. what are you seeing in the flink job logs?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 14:59:02
+
+

*Thread Reply:* I think setting execution.attached might not work when you set it this way

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-02 15:05:05
+
+

*Thread Reply:* is there an option to use regular flink-conf.yaml?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:48:34
+
+

*Thread Reply:* in the flink logs im seeing the io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat.

+ +

in marquez im seeing the job result cannot be fetched.

+ +

we cant modify flink-conf in managed flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:49:39
+
+

*Thread Reply:*

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:49:54
+
+

*Thread Reply:* this is what i see at marquez at ec2

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:50:58
+
+

*Thread Reply:* hmmm.. I'm wondering if the issue is with Marquez processing the events or the openlineage events themselves. +can you try with: +props.put("openlineage.transport.type","console"); +?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:51:08
+
+

*Thread Reply:* compared to what i see locally. Locally is the same job but just writing to localhost marquez, but im passing the openlineage conf trough env

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:52:50
+
+

*Thread Reply:* @Harel Shein when set to console, where will the events be printed? Cloudwatch logs?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:53:17
+
+

*Thread Reply:* I think so, yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 15:53:20
+
+

*Thread Reply:* let me try

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:53:39
+
+

*Thread Reply:* the same place you're seeing your flink logs right now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 15:54:12
+
+

*Thread Reply:* the same place you found that client exception

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:34
+
+

*Thread Reply:* I will post the events

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:37
+
+

*Thread Reply:* "logger": "io.openlineage.flink.OpenLineageFlinkJobListener", "message": "onJobSubmitted event triggered for flink-jobs-prod.kafka-iceberg-prod", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:09:52
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.TransformationUtils.processLegacySinkTransformation(TransformationUtils.java:90)", "logger": "io.openlineage.flink.TransformationUtils", "message": "Processing legacy sink operator Print to System.out", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:10:08
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.TransformationUtils.processLegacySinkTransformation(TransformationUtils.java:90)", "logger": "io.openlineage.flink.TransformationUtils", "message": "Processing legacy sink operator org.apache.flink.streaming.api.functions.sink.DiscardingSink@68d0a141", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:10:46
+
+

*Thread Reply:* "locationInformation": "io.openlineage.client.transports.ConsoleTransport.emit(ConsoleTransport.java:21)", "logger": "io.openlineage.client.transports.ConsoleTransport", "message": "{\"eventTime\":\"2024_04_02T20:07:03.30108Z\",\"producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"schemaURL\":\"<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>\",\"eventType\":\"START\",\"run\":{\"runId\":\"cda9a0d2_6dfd_4db2_b3d0_f11d7b082dc0\"},\"job\":{\"namespace\":\"flink_jobs_prod\",\"name\":\"kafka-iceberg-prod\",\"facets\":{\"jobType\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>\",\"processingType\":\"STREAMING\",\"integration\":\"FLINK\",\"jobType\":\"JOB\"}}},\"inputs\":[{\"namespace\":\"<kafka://b-1.mskflinkopenlineage>.&lt;&gt;.<http://kafka.us-east-1.amazonaws.com:9092,b_3.mskflinkopenlineage.&lt;&gt;kafka.us_east_1.amazonaws.com:9092,b-2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us-east-1.amazonaws.com:9092\%22,\%22name\%22:\%22temperature-samples\%22,\%22facets\%22:{\%22schema\%22:{\%22_producer\%22:\%22&lt;https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink&gt;\%22,\%22_schemaURL\%22:\%22&lt;https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet&gt;\%22,\%22fields\%22:[{\%22name\%22:\%22sensorId\%22,\%22type\%22:\%22int\%22},{\%22name\%22:\%22room\%22,\%22type\%22:\%22string\%22},{\%22name\%22:\%22temperature\%22,\%22type\%22:\%22float\%22},{\%22name\%22:\%22sampleTime\%22,\%22type\%22:\%22long\%22}]}}|kafka.us_east_1.amazonaws.com:9092,b-3.mskflinkopenlineage.&lt;&gt;kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us_east_1.amazonaws.com:9092\",\"name\":\"temperature_samples\",\"facets\":{\"schema\":{\"_producer\":\"&lt;https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink&gt;\",\"_schemaURL\":\"&lt;https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet&gt;\",\"fields\":[{\"name\":\"sensorId\",\"type\":\"int\"},{\"name\":\"room\",\"type\":\"string\"},{\"name\":\"temperature\",\"type\":\"float\"},{\"name\":\"sampleTime\",\"type\":\"long\"}]}}>}],\"outputs\":[{\"namespace\":\"<s3://iceberg-open-lineage-891377161433>\",\"name\":\"/iceberg/open_lineage.db/open_lineage_room_temperature_prod\",\"facets\":{\"schema\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>\",\"fields\":[{\"name\":\"room\",\"type\":\"STRING\"},{\"name\":\"temperature\",\"type\":\"FLOAT\"},{\"name\":\"sampleCount\",\"type\":\"INTEGER\"},{\"name\":\"lastSampleTime\",\"type\":\"TIMESTAMP\"}]}}}]}",

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:12
+
+

*Thread Reply:* locationInformation": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker.startTracking(OpenLineageContinousJobTracker.java:100)", "logger": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker", "message": "Starting tracking thread for jobId=de9e0d5b5d19437910975f231d5ed4b5", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:25
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.OpenLineageFlinkJobListener.onJobExecuted(OpenLineageFlinkJobListener.java:191)", "logger": "io.openlineage.flink.OpenLineageFlinkJobListener", "message": "onJobExecuted event triggered for flink-jobs-prod.kafka-iceberg-prod", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:11:41
+
+

*Thread Reply:* "locationInformation": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker.stopTracking(OpenLineageContinousJobTracker.java:120)", "logger": "io.openlineage.flink.tracker.OpenLineageContinousJobTracker", "message": "stop tracking", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:12:07
+
+

*Thread Reply:* "locationInformation": "io.openlineage.client.transports.ConsoleTransport.emit(ConsoleTransport.java:21)", "logger": "io.openlineage.client.transports.ConsoleTransport", "message": "{\"eventTime\":\"2024_04_02T20:07:04.028017Z\",\"producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"schemaURL\":\"<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>\",\"eventType\":\"FAIL\",\"run\":{\"runId\":\"cda9a0d2_6dfd_4db2_b3d0_f11d7b082dc0\",\"facets\":{\"errorMessage\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/1-0-0/ErrorMessageRunFacet.json#/$defs/ErrorMessageRunFacet>\",\"message\":\"The Job Result cannot be fetched through the Job Client when in Web Submission.\",\"programmingLanguage\":\"JAVA\",\"stackTrace\":\"org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission.\\n\\tat org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92)\\n\\tat org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:152)\\n\\tat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:123)\\n\\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)\\n\\tat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:342)\\n\\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\\n\\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\\n\\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\\n\\tat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)\\n\\tat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)\\n\\tat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)\\n\\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)\\n\\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)\\n\\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)\\n\\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)\\n\\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\\n\\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\\n\\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:829)\\n\"}}},\"job\":{\"namespace\":\"flink_jobs_prod\",\"name\":\"kafka-iceberg-prod\",\"facets\":{\"jobType\":{\"_producer\":\"<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>\",\"_schemaURL\":\"<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>\",\"processingType\":\"STREAMING\",\"integration\":\"FLINK\",\"jobType\":\"JOB\"}}}}", "messageSchemaVersion": "1", "messageType": "INFO", "threadName": "Flink-DispatcherRestEndpoint-thread-4" }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:15:35
+
+

*Thread Reply:* this is what i see in cloudwatch when set to console

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:17:50
+
+

*Thread Reply:* So its nothing to do with marquez but with openlineage and flink

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:22:10
+
+

*Thread Reply:* hmm.. the start event actually looks pretty good to me: +{ + "eventTime": "2024-04-02T20:07:03.30108Z", + "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", + "eventType": "START", + "run": { + "runId": "cda9a0d2-6dfd-4db2-b3d0-f11d7b082dc0" + }, + "job": { + "namespace": "flink-jobs-prod", + "name": "kafka-iceberg-prod", + "facets": { + "jobType": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>", + "processingType": "STREAMING", + "integration": "FLINK", + "jobType": "JOB" + } + } + }, + "inputs": [ + { + "namespace": "<kafka://b-1.mskflinkopenlineage>.&lt;&gt;.<http://kafka.us-east-1.amazonaws.com:9092,b_3.mskflinkopenlineage.&lt;&gt;kafka.us_east_1.amazonaws.com:9092,b-2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us-east-1.amazonaws.com:9092|kafka.us_east_1.amazonaws.com:9092,b-3.mskflinkopenlineage.&lt;&gt;kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.&lt;&gt;.c22.kafka.us_east_1.amazonaws.com:9092>", + "name": "temperature-samples", + "facets": { + "schema": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", + "fields": [ + { + "name": "sensorId", + "type": "int" + }, + { + "name": "room", + "type": "string" + }, + { + "name": "temperature", + "type": "float" + }, + { + "name": "sampleTime", + "type": "long" + } + ] + } + } + } + ], + "outputs": [ + { + "namespace": "<s3://iceberg-open-lineage-891377161433>", + "name": "/iceberg/open_lineage.db/open_lineage_room_temperature_prod", + "facets": { + "schema": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", + "fields": [ + { + "name": "room", + "type": "STRING" + }, + { + "name": "temperature", + "type": "FLOAT" + }, + { + "name": "sampleCount", + "type": "INTEGER" + }, + { + "name": "lastSampleTime", + "type": "TIMESTAMP" + } + ] + } + } + } + ] +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:22:37
+
+

*Thread Reply:* so with that start event should marquez be able to build the proper lineage?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:22:57
+
+

*Thread Reply:* This is what i would get with flink marquez locally

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:23:33
+
+

*Thread Reply:* yes, but then it looks like the flink job is failing and we're seeing this event: +{ + "eventTime": "2024-04-02T20:07:04.028017Z", + "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", + "eventType": "FAIL", + "run": { + "runId": "cda9a0d2-6dfd-4db2-b3d0-f11d7b082dc0", + "facets": { + "errorMessage": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/ErrorMessageRunFacet.json#/$defs/ErrorMessageRunFacet>", + "message": "The Job Result cannot be fetched through the Job Client when in Web Submission.", + "programmingLanguage": "JAVA", + "stackTrace": "org.apache.flink.util.FlinkRuntimeException: The Job Result cannot be fetched through the Job Client when in Web Submission.ntat org.apache.flink.client.deployment.application.WebSubmissionJobClient.getJobExecutionResult(WebSubmissionJobClient.java:92)ntat org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:152)ntat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:123)ntat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)ntat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:342)ntat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)ntat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)ntat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)ntat java.base/java.lang.reflect.Method.invoke(Method.java:566)ntat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)ntat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)ntat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)ntat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)ntat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)ntat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)ntat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)ntat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)ntat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)ntat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)ntat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)ntat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)ntat java.base/java.lang.Thread.run(Thread.java:829)n" + } + } + }, + "job": { + "namespace": "flink-jobs-prod", + "name": "kafka-iceberg-prod", + "facets": { + "jobType": { + "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.10.2/integration/flink>", + "_schemaURL": "<https://openlineage.io/spec/facets/2-0-2/JobTypeJobFacet.json#/$defs/JobTypeJobFacet>", + "processingType": "STREAMING", + "integration": "FLINK", + "jobType": "JOB" + } + } + } +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:24:11
+
+

*Thread Reply:* But the thing is that the flink job is not really failling

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-02 16:25:03
+
+

*Thread Reply:* interesting, would love to see what @Paweł Leszczyński / @Maciej Obuchowski / @Peter Huang think. This is beyond my depth on the flink integration 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-02 16:34:51
+
+

*Thread Reply:* Thanks Harel!! Yes please, it would be great to see how openlineage can work with AWS Managed flink

+ + + +
+ ➕ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 02:43:12
+
+

*Thread Reply:* Just to clarify - is this setup working with openlineage flink integration turned off? From what I understand, your job emits cool START event, than a job fails and emits FAIL event with error stacktrace The Job Result cannot be fetched through the Job Client when in Web Submission which is cool as well.

+ +

The question is: does it fail bcz of Openlineage integration or it is just Openlineage which carries stacktrace of a failed job. I couldn't see anything Openlineage related in the stacktrace.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:43:34
+
+

*Thread Reply:* What do you mean with Flink integration turned off?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:44:28
+
+

*Thread Reply:* the flink job is not failling but, we are receiving an openlineage event that says fail, to which we then not see the proper dag in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:45:18
+
+

*Thread Reply:* does openlineage work if the job is submited through web submission?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:47:44
+
+

*Thread Reply:* the answer is "probably not unless you can set up execution.attached beforehand"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:48:49
+
+

*Thread Reply:* execution.attached doesnt seem to work with job submitted through web submission.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:54:51
+
+

*Thread Reply:* When setting execution attached to false, i only get the start event, but it doesnt build the dag in the job space in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:57:14
+
+

*Thread Reply:*

+ + + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 09:57:40
+
+

*Thread Reply:* I still see this in cloudwatch logs: locationInformation": "io.openlineage.flink.client.EventEmitter.emit(EventEmitter.java:50)", "logger": "io.openlineage.flink.client.EventEmitter", "message": "Failed to emit OpenLineage event: ", "messageSchemaVersion": "1", "messageType": "ERROR", "threadName": "Flink-DispatcherRestEndpoint-thread-1", "throwableInformation": "io.openlineage.client.transports.HttpTransportResponseException: code: 400, response: \n\tat io.openlineage.client.transports.HttpTransport.throwOnHttpError(HttpTransport.java:151)\n\tat io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:128)\n\tat io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:115)\n\tat io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:60)\n\tat io.openlineage.flink.client.EventEmitter.emit(EventEmitter.java:48)\n\tat io.openlineage.flink.visitor.lifecycle.FlinkExecutionContext.lambda$onJobSubmitted$0(FlinkExecutionContext.java:66)\n\tat io.openlineage.client.circuitBreaker.NoOpCircuitBreaker.run(NoOpCircuitBreaker.java:27)\n\tat io.openlineage.flink.visitor.lifecycle.FlinkExecutionContext.onJobSubmitted(FlinkExecutionContext.java:59)\n\tat io.openlineage.flink.OpenLineageFlinkJobListener.start(OpenLineageFlinkJobListener.java:180)\n\tat io.openlineage.flink.OpenLineageFlinkJobListener.onJobSubmitted(OpenLineageFlinkJobListener.java:156)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.lambda$executeAsync$12(StreamExecutionEnvironment.java:2099)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2099)\n\tat org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:188)\n\tat org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:119)\n\tat org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)\n\tat com.amazonaws.services.msf.KafkaStreamingJob.main(KafkaStreamingJob.java:345)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)\n\tat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)\n\tat org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)\n\tat org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)\n\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$3(JarRunOverrideHandler.java:239)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:01:52
+
+

*Thread Reply:* I think it will be a limitation of our integration then, at least until https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener - the way we're integrating with Flink requires it to be able to access execution results +https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/app/src/main/java/io/openlineage/flink/OpenLineageFlinkJobListener.java#L[…]6

+ +

not sure if we can somehow work around this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:04:09
+
+

*Thread Reply:* with that flip we wouldnt need execution.attached?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:04:58
+
+

*Thread Reply:* Nope - it would add different mechanism to integrate with Flink other than JobListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:09:38
+
+

*Thread Reply:* Could a workaround be, instead of having the http tranport, sending to kafka and have a java/python client writing the events to marquez?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:10:30
+
+

*Thread Reply:* because i just tried with executtion.attached to false and with console transport, i just receive the event for start but no errors. not sure if thats the only event needed in marquez to build a dag

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:16:42
+
+

*Thread Reply:* also, wondering if the event actually reached marquez, why wouldnt the job dag be showned?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:16:52
+
+

*Thread Reply:* its the same start event i have received when running localy

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:17:15
+
+

*Thread Reply:*

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:25:47
+
+

*Thread Reply:* comparison of marquez receiving event from managed flink on aws (left). to marquez localhost receiving event from local flink. its the same event. however marquez in ec2 is not building dag

+ + + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 10:26:14
+
+

*Thread Reply:* @Maciej Obuchowski is there any other event needed for dag?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 10:38:48
+
+

*Thread Reply:* > Could a workaround be, instead of having the http tranport, sending to kafka and have a java/python client writing the events to marquez? +I think there are two problems, and the 400 is probably just the followup from the original one - maybe too long stacktrace makes Marquez reject the event? +The original one, the attached one, is the cause why the integration tries to send the FAIL event at the first place

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 10:45:35
+
+

*Thread Reply:* For the error described in message "The Job Result cannot be fetched through the Job Client when in Web Submission.", I feel it is a bug in flink. Which version of flink are you using? @Francisco Morillo

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:02:46
+
+

*Thread Reply:* looking at implementation, it seems to be by design: +/**** + ** A {@link JobClient} that only allows asking for the job id of the job it is attached to. + ** + ** &lt;p&gt;This is used in web submission, where we do not want the Web UI to have jobs blocking threads + ** while waiting for their completion. + **/

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 11:32:51
+
+

*Thread Reply:* Yes, looks like flink code try to fetch the Job Result for the web submission job, thus the exception is raised.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:27:05
+
+

*Thread Reply:* Flink 1.15.2

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:28:00
+
+

*Thread Reply:* But still wouldnt marquez be able to build the dag with the start event?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:28:50
+
+

*Thread Reply:* In Marquez, new dataset version is created when the run completes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:29:14
+
+

*Thread Reply:* but that doesnt show as events in marquez right?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:29:33
+
+

*Thread Reply:* I think that was going to be changed for streaming jobs - right @Paweł Leszczyński? - but not sure if that's already merged

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:33:34
+
+

*Thread Reply:* in latest marquez version?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:41:52
+
+

*Thread Reply:* is this the right transport url? props.put("openlineage.transport.url","http://localhost:5000/api/v1/lineage");

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 12:42:36
+
+

*Thread Reply:* because i was able to see streaming jobs in marquez when running locally, as well as having a flink local job writing to the marquez on ec2. its as the dataset and job doesnt get created in marquez from the event

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 13:05:28
+
+

*Thread Reply:* I tried with flink 1.18 and same. i receive the start event but the job and dataset are not created in marquez

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 13:15:59
+
+

*Thread Reply:* If i try locally and set execution.attached to false it does work. So it seems that the main issue is that openlineage doesnt work with flink job submission through web ui

+ + + +
+ 👀 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 16:54:20
+
+

*Thread Reply:* From my understanding until now, set execution.attched = false mitigates the exception in flink (at least from the flink code, it is the logic). On the other hand, the question goes to when to build the dag when receive events. @Paweł Leszczyński From our org, we changed the default behavior. The flink listener will periodically send running events out. Once the lineage backend receive the running event, a new dag will be created.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 17:00:26
+
+

*Thread Reply:* How can i configure that?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Peter Huang + (huangzhenqiu0825@gmail.com) +
+
2024-04-03 17:02:00
+
+

*Thread Reply:* To send periodical running event, some changes are needed in the open lineage flink lib. Let's wait for @Paweł Leszczyński for concrete plan. I am glad to create a PR for this.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 17:05:28
+
+

*Thread Reply:* im still wondering why the dag was not created in marquez, unless there are some other events that open lineage sends for it to build the job and dataset that if submitted through webui it doesnt work. I will try to replicate in EMR

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 20:37:03
+
+

*Thread Reply:* Looking at marquez logs, im seeing this

+ +

arquez.api.OpenLineageResource: Unexpected error while processing request +! java.lang.IllegalArgumentException: namespace '<kafka://b-1.mskflinkopenlineage.fdz2z7.c22.kafka.us-east-1.amazonaws.com:9092>,b-3.mskflinkopenlineage.fdz2z7.c22.kafka.us-east-1.amazonaws.com:9092,b_2.mskflinkopenlineage.fdz2z7.c22.kafka.us_east_1.amazonaws.com:9092' must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), at (@), plus (+), dashes (-), colons (:), equals (=), semicolons (;), slashes (/) or dots (.) with a maximum length of 1024 characters.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 20:37:38
+
+

*Thread Reply:* can marquez work with msk?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:43:06
+
+

*Thread Reply:* The graph on Marquez side should be present just after sending START event, once the START contains information about input/output datasets. Commas are the problem here and we should modify Flink integration to separate broker list by a semicolon.

+ + + +
+ ✅ Francisco Morillo +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-03 05:50:05
+
+

Hi all, I've opened a PR for the dbt-ol script. We've noticed that the script doesn't transparently return/exit the exit code of the child dbt process. This makes it hard for the parent process to tell if the underlying workflow succeeded or failed - in the case of Airflow, the parent DAG will mark the job as succeeded even if it actually failed. Let me know if you have thought/comments (cc @Arnab Bhattacharyya)

+
+ + + + + + + +
+
Labels
+ integration/dbt +
+ + + + + + + + + + +
+ + + +
+ ❤️ Harel Shein +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Tristan GUEZENNEC -CROIX- + (tristan.guezennec@decathlon.com) +
+
2024-04-04 04:41:36
+
+

*Thread Reply:* @Sophie LY FYI

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-03 06:33:34
+
+

Is there a timeline for the 1.11.0 release? Now that the dbt-ol fix has been merged we may either wait for the release or temporarily point to main

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 06:34:09
+
+

*Thread Reply:* I think it’s going to be today or really soon. cc: @Michael Robinson

+ + + +
+ 🎉 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:37:45
+
+

*Thread Reply:* would be great if we could fix the unknown facet memory issue in this release, I think @Paweł Leszczyński @Damien Hawes are working on it

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:38:02
+
+

*Thread Reply:* I think this is a critical kind of bug

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:39:27
+
+

*Thread Reply:* Yeah, it's a tough-to-figure-out-where-the-fix-should-be kind of bug.

+ + + +
+ 😨 Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:39:56
+
+

*Thread Reply:* The solution is simple, at least in my mind. If spark_unknown is disabled, don't accumulate state.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:40:11
+
+

*Thread Reply:* i think we should go first with unknown entry facet as it has bigger impact

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:40:12
+
+

*Thread Reply:* if there's no better fast idea, just disable that facet for now?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:26
+
+

*Thread Reply:* It doesn't matter if the facet is disabled or not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:38
+
+

*Thread Reply:* The UnknownEntryFacetListener still accumulates state

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:40:48
+
+

*Thread Reply:* @Damien Hawes will you be able to prepare this today/tomorrow?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:40:58
+
+

*Thread Reply:* disable == comment/remove code related to it, together with UnknownEntryFacetListener 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:40:59
+
+

*Thread Reply:* I'm working on it today

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:41:01
+
+

*Thread Reply:* in this case 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:41:31
+
+

*Thread Reply:* You're proposing to rip the code out completely?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:42:02
+
+

*Thread Reply:* at least for this release - I think it's better to release code without it and without memory bug, rather than having it bugged as it is

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:42:06
+
+

*Thread Reply:* The only place where I see it being applied is here:

+ +

``` private <L extends LogicalPlan> QueryPlanVisitor<L, D> asQueryPlanVisitor(T event) { + AbstractQueryPlanDatasetBuilder<T, P, D> builder = this; + return new QueryPlanVisitor<L, D>(context) { + @Override + public boolean isDefinedAt(LogicalPlan x) { + return builder.isDefinedAt(event) && isDefinedAtLogicalPlan(x); + }

+ +
  @Override
+  public List&lt;D&gt; apply(LogicalPlan x) {
+    unknownEntryFacetListener.accept(x);
+    return builder.apply(event, (P) x);
+  }
+};
+
+ +

}```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:42:11
+
+

*Thread Reply:* come on, this should be few lines of change

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:42:17
+
+

*Thread Reply:* Inside: AbstractQueryPlanDatasetBuilder

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:42:21
+
+

*Thread Reply:* once we know what it is

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:42:32
+
+

*Thread Reply:* it's useful in some narrow debug cases, but the memory bug potentially impacts all

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:43:15
+
+

*Thread Reply:* openLineageContext + .getQueryExecution() + .filter(qe -&gt; !FacetUtils.isFacetDisabled(openLineageContext, "spark_unknown")) + .flatMap(qe -&gt; unknownEntryFacetListener.build(qe.optimizedPlan())) + .ifPresent(facet -&gt; runFacetsBuilder.put("spark_unknown", facet)); +this should always clean the listener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:19
+
+

*Thread Reply:* @Paweł Leszczyński - every time AbstractQueryPlanDatasetBuilder#apply is called, the UnknownEntryFacetListener is invoked

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:43:38
+
+

*Thread Reply:* the code is within OpenLineageRunEventBuilder

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:50
+
+

*Thread Reply:* @Paweł Leszczyński - it will only clean the listener, if spark_unknown is enabled

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:43:56
+
+

*Thread Reply:* because of that filter step

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:44:11
+
+

*Thread Reply:* but the listener still accumulates state, regardless of that snippet you shared

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:44:12
+
+

*Thread Reply:* yes, and we need to modify it to always clean

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:45:45
+
+

*Thread Reply:* We have a difference in understanding here, I think.

+ +
  1. If spark_unknown is disabled, the UnknownEntryFacetListener still accumulates state. Your proposed change will not clean that state.
  2. If spark_unknown is enabled, well, sometimes we get StackOverflow errors due to infinite recursion during serialisation.
  3. +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:46:35
+
+

*Thread Reply:* just to get a bit out from particular solution: I would love if we could either release with

+ +
  1. a proper fix that won't accumulate memory if facet is disabled, and clean up it it's not
  2. have that facet removed for now +I don't want to have a release now that will contain this bug, because we're trying to do a "good" solution but have no time to do it properly for the release
  3. +
+ + + +
+ 👍 Damien Hawes +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:46:57
+
+

*Thread Reply:* I think the impact of this bug is big

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:47:24
+
+

*Thread Reply:* My opinion is that perhaps the OpenLineageContext object needs to be extended to hold which facets are enabled / disabled.

+ + + +
+ ➕ Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:47:52
+
+

*Thread Reply:* This way, things that inherit from AbstractQueryPlanDatasetBuilder can check, should they be a no-op or not

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:48:36
+
+

*Thread Reply:* Or, +```private <L extends LogicalPlan> QueryPlanVisitor<L, D> asQueryPlanVisitor(T event) { + AbstractQueryPlanDatasetBuilder<T, P, D> builder = this; + return new QueryPlanVisitor<L, D>(context) { + @Override + public boolean isDefinedAt(LogicalPlan x) { + return builder.isDefinedAt(event) && isDefinedAtLogicalPlan(x); + }

+ +
@Override
+public List&lt;D&gt; apply(LogicalPlan x) {
+  unknownEntryFacetListener.accept(x);
+  return builder.apply(event, (P) x);
+}
+
+ +

}; +}``` +This needs to be changed

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:48:40
+
+

*Thread Reply:* @Damien Hawes could u look at this again https://github.com/OpenLineage/OpenLineage/pull/2557/files ?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:49:27
+
+

*Thread Reply:* i think clearing visitedNodes within populateRun should solve this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:51:01
+
+

*Thread Reply:* the solution is (1) don't store logical plans, but their string representation (2) clear what you collected after populating a facet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:51:18
+
+

*Thread Reply:* even if it works, I still don't really like it because we accumulate state in asQueryPlanVisitor just to clear it later

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 06:51:19
+
+

*Thread Reply:* It works, but I'm still annoyed that UnknownEntryFacetListener is being called in the first place

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:51:46
+
+

*Thread Reply:* also i think in case of really large plans it could be an issue still?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 06:53:06
+
+

*Thread Reply:* why @Maciej Obuchowski?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:55:47
+
+

*Thread Reply:* we've seen >20MB serialized logical plans, and that's what essentially treeString does if I understand it correctly

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 06:56:56
+
+

*Thread Reply:* and then the serialization can potentially still take some time...

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:01:19
+
+

*Thread Reply:* where did you find treeString serializes a plan?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:05:44
+
+

*Thread Reply:* treeString is used by default toString method of TreeNode, so would be super weird if they serialized entire object within it. I couldn't find any of such code within Spark implementation

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:19:02
+
+

*Thread Reply:* I also remind you, that there is the problem with the job metrics holder as well

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:19:17
+
+

*Thread Reply:* That will also, eventually, cause an OOM crash

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 07:27:41
+
+

*Thread Reply:* So, I agreeUnknownEntryFacetListener code should not be called if a facet is disabled. I agree we should have another PR and fix for job metrics.

+ +

The question is: what do we want to have shipped within the next release? Do we want to get rid of static member that acumulates all the logical plans (which is cleaner approach) or just clear it once not needed anymore? I think we'll need to clear it anyway in case someone turns the unkown facet feature on.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 07:39:09
+
+

*Thread Reply:* In my opinion, the approach for the immediate release is to clear the plans. Though, I'd like tests that prove it works.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 08:10:02
+
+

*Thread Reply:* @Damien Hawes so let's go with Paweł's PR?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 08:24:04
+
+

*Thread Reply:* So, prooving this helps would be great. One option would be to prepare integration test that runs something and verifies later on that private static map is empty. Another, a way nicer, would be to write a code that generates a few MB dataset reads into memory and saves into a file, and then within integration tests code runs something like https://github.com/jerolba/jmnemohistosyne to see memory consumption of classess we're interested in (not sure how difficult this is to write such thing)

+ +

This could be also beneficial to prevent similar issues in future and solve job metrics issue.

+
+ + + + + + + +
+
Stars
+ 15 +
+ +
+
Language
+ Java +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:37
+
+

*Thread Reply:* @Damien Hawes @Paweł Leszczyński would be great to clarify if you're working on it now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:43
+
+

*Thread Reply:* as this blocks release

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:02:47
+
+

*Thread Reply:* fyi @Michael Robinson

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-03 09:48:12
+
+

*Thread Reply:* I can try to prove that the PR I propoposed brings improvement. However, if Damien wants to work on his approach targetting this release, I am happy to hand it over.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:20:24
+
+

*Thread Reply:* I'm not working on it at the moment. I think Pawel's approach is fine for the time being.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:20:31
+
+

*Thread Reply:* I'll focus on the JobMetricsHolder problem

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 10:24:54
+
+

*Thread Reply:* Side note: @Paweł Leszczyński @Maciej Obuchowski - are you able to give any guidance why the UnknownEntryFacetListener was implemented that way, as opposed to just examining the event in a stateless manner?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:18:28
+
+

*Thread Reply:* OK. @Paweł Leszczyński @Maciej Obuchowski - I think I found the memory leak with JobMetricsHolder. If we receive an event like SparkListenerJobStart, but there isn't any dataset in it, it looks like we're storing the metrics, but we never get rid of them.

+ + + +
+ 😬 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:21:50
+
+

*Thread Reply:* Here's the logs

+ +
+ + + + + + + +
+ + +
+ 🙌 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:36:50
+
+

*Thread Reply:* > Side note: @Paweł Leszczyński @Maciej Obuchowski - are you able to give any guidance why the UnknownEntryFacetListener was implemented that way, as opposed to just examining the event in a stateless manner? +It's one of the older parts of codebase, implemented mostly in 2021 by person no longer associated with the project... hard to tell to be honest 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:37:52
+
+

*Thread Reply:* but I think we have much more freedom to modify it, as it's not standarized or user facing feature

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:47:02
+
+

*Thread Reply:* to solve stageMetrics issue - should they always be a separate Map per job that's associated with jobId allowing it to be easily cleaned... but there's no jobId on SparkListenerTaskEnd

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:16
+
+

*Thread Reply:* Nah

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:17
+
+

*Thread Reply:* Actually

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:21
+
+

*Thread Reply:* Its simpler than that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:36
+
+

*Thread Reply:* The bug is here:

+ +

public void cleanUp(int jobId) { + Set&lt;Integer&gt; stages = jobStages.remove(jobId); + stages = stages == null ? Collections.emptySet() : stages; + stages.forEach(jobStages::remove); + }

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:47:51
+
+

*Thread Reply:* We remove from jobStages N + 1 times

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:48:14
+
+

*Thread Reply:* JobStages is supposed to carry a mapping from Job -&gt; Stage

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:48:30
+
+

*Thread Reply:* and stageMetrics a mapping from Stage -&gt; TaskMetrics

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:49:00
+
+

*Thread Reply:* ah yes

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:03
+
+

*Thread Reply:* Here, we remove the job from jobStages, and obtain the associated stages, and then we use those stages to remove from jobStages again

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:11
+
+

*Thread Reply:* It's a "huh?" moment

+ + + +
+ 😂 Jakub Dardziński +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:49:53
+
+

*Thread Reply:* The amount of logging I added, just to see this, was crazy

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:50:46
+
+

*Thread Reply:* public void cleanUp(int jobId) { + Set&lt;Integer&gt; stages = jobStages.remove(jobId); + stages = stages == null ? Collections.emptySet() : stages; + stages.forEach(stageMetrics::remove); + } +so it's just jobStages -> stageMetrics here, right?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:50:57
+
+

*Thread Reply:* Yup

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:51:09
+
+

*Thread Reply:* yeah it looks so obvious after seeing that 😄

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:51:40
+
+

*Thread Reply:* I even wrote a separate method to clear the stageMetrics map

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 11:51:41
+
+

*Thread Reply:* it was there since 2021 in that form 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:00
+
+

*Thread Reply:* and placed it in the same locations as the cleanUp method in the OpenLineageSparkListener

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:09
+
+

*Thread Reply:* Wrote a unit test

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:12
+
+

*Thread Reply:* It fails

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:17
+
+

*Thread Reply:* and I was like, "why?"

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 11:52:25
+
+

*Thread Reply:* Investigate further, and then I noticed this method

+ + + +
+ 😄 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-03 12:33:42
+ +
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 14:39:06
+
+

*Thread Reply:* Has Damien's PR unblocked the release?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 14:39:33
+
+

*Thread Reply:* No, we need one more from Paweł

+ + + +
+ :gratitude_thank_you: Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-04 10:37:42
+
+

*Thread Reply:* OK. Pawel's PR has been merged @Michael Robinson

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Damien Hawes + (damien.hawes@booking.com) +
+
2024-04-04 12:12:28
+
+

*Thread Reply:* Given these developments, I'ld like to call for a release of 1.11.0 to happen today, unless there are any objections.

+ + + +
+ ➕ Harel Shein, Jakub Dardziński +
+ +
+ 👀 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-04 12:28:38
+
+

*Thread Reply:* Changelog PR is RFR: https://github.com/OpenLineage/OpenLineage/pull/2574

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 14:29:04
+
+

*Thread Reply:* CircleCI has problems

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:12:27
+
+

*Thread Reply:* ```self = <tests.conftest.DagsterRunLatestProvider object at 0x7fcd84faed60> +repositoryname = 'testrepo'

+ +
def get_instance(self, repository_name: str) -&gt; DagsterRun:
+
+ +

> from dagster.core.remoterepresentation.origin import ( + ExternalJobOrigin, + ExternalRepositoryOrigin, + InProcessCodeLocationOrigin, + ) +E ImportError: cannot import name 'ExternalJobOrigin' from 'dagster.core.remoterepresentation.origin' (/home/circleci/.pyenv/versions/3.8.19/lib/python3.8/site-packages/dagster/core/remote_representation/origin.py)

+ +

tests/conftest.py:140: ImportError```

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:12:39
+
+

*Thread Reply:* &gt;&gt;&gt; from dagster.core.remote_representation.origin import ( +... ExternalJobOrigin, +... ExternalRepositoryOrigin, +... InProcessCodeLocationOrigin, +... ) +Traceback (most recent call last): + File "&lt;stdin&gt;", line 1, in &lt;module&gt; + File "&lt;frozen importlib._bootstrap&gt;", line 1176, in _find_and_load + File "&lt;frozen importlib._bootstrap&gt;", line 1138, in _find_and_load_unlocked + File "&lt;frozen importlib._bootstrap&gt;", line 1078, in _find_spec + File "/home/blacklight/git_tree/OpenLineage/venv/lib/python3.11/site-packages/dagster/_module_alias_map.py", line 36, in find_spec + assert base_spec, f"Could not find module spec for {base_name}." +AssertionError: Could not find module spec for dagster._core.remote_representation. +&gt;&gt;&gt; from dagster.core.host_representation.origin import ( +... ExternalJobOrigin, +... ExternalRepositoryOrigin, +... InProcessCodeLocationOrigin, +... ) +&gt;&gt;&gt; ExternalJobOrigin +&lt;class 'dagster._core.host_representation.origin.ExternalJobOrigin'&gt;

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:13:07
+
+

*Thread Reply:* It seems that the parent module should be dagster.core.host_representation.origin, not dagster.core.remote_representation.origin

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:14:55
+
+

*Thread Reply:* did you rebase? for >=1.6.9 it’s dagster.core.remote_representation.origin, should be ok

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:18:06
+
+

*Thread Reply:* Indeed, I was just looking at https://github.com/dagster-io/dagster/pull/20323 (merged 4 weeks ago)

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:18:43
+
+

*Thread Reply:* I did a pip install of the integration from main and it seems to install a previous version though:

+ +

&gt;&gt;&gt; dagster.__version__ +'1.6.5'

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:18:59
+
+

*Thread Reply:* try --force-reinstall maybe

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:19:08
+
+

*Thread Reply:* it works fine for me, CI doesn’t crash either

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:20:09
+
+

*Thread Reply:* https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/10020/workflows/4d3a33b4-47ef-4cf6-b6de-1bb95611fad7/jobs/200011 (although the ImportError seems to be different from mine)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:20:53
+
+

*Thread Reply:* huh, how didn’t I see this

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:21:30
+
+

*Thread Reply:* I think we should limit upper version of dagster, it’s not even really maintained

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:28:14
+
+

*Thread Reply:* I've also just noticed that ExternalJobOrigin and ExternalRepositoryOrigin have been renamed to RemoteJobOrigin and RemoteRepositoryOrigin on 1.7.0 - and that's apparently the version the CI installed

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-04 18:28:32
+
+

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2579

+ + + +
+ 👍 Fabio Manganiello +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:24:26
+
+

Hey 👋 +When I am running TrinoOperator on Airflow 2.7 I am getting this: +[2024-04-03, 11:10:44 UTC] {base.py:162} WARNING - OpenLineage provider method failed to extract data from provider. +[2024-04-03, 11:10:44 UTC] {manager.py:276} WARNING - Extractor returns non-valid metadata: None +I've upgraded apache-airflow-providers-openlineage to 1.6.0 (maybe it is too new for Airflow 2.7 version?). +And due to the warning I am ending with empty input/output facets... Seems that it is not capable to connect to Trino and extract table structure... When I tried on our prod Airflow version (2.6.3) and openlineage-airflow it was capable to connect and extract table structure, but not to do the column level lineage mapping.

+ +

Any input would be very helpful. +Thanks

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:28:29
+
+

*Thread Reply:* Tried with default version of OL plugin that comes with 2.7 Airflow (1.0.1) so result was the same

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 07:31:55
+
+

*Thread Reply:* Could you please enable DEBUG logs in Airflow and provide them?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:42:14
+
+

*Thread Reply:*

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 07:50:30
+
+

*Thread Reply:* thanks +it seems like only the beginning of the logs. I’m assuming it fails on complete event

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 07:56:00
+
+

*Thread Reply:* I am sorry! This is the full log

+ +
+ + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:00:03
+
+

*Thread Reply:* What I also just realised that we have our own TrinoOperator implementation, which inherits from SQLExecuteQueryOperator (same as original TrinoOperator)... So maybe inlets and outlets aren't being set due to that

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:00:52
+
+

*Thread Reply:* yeah, it could interfere

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:01:04
+
+

*Thread Reply:* But task was rather simple: +create_table_apps_log_test = TrinoOperator( + task_id=f"create_table_test", + sql=""" + CREATE TABLE if not exists mytable as + SELECT app_id, msid, instance_id from table limit 1 + """ +)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:01:26
+
+

*Thread Reply:* do you use some other hook to connect to Trino?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:03:12
+
+

*Thread Reply:* Just checked. So we have our own hook to connect to Trino... that inherits from TrinoHook 🙄

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:06:05
+
+

*Thread Reply:* hard to say, you could check https://github.com/apache/airflow/blob/main/airflow/providers/trino/hooks/trino.py#L252 to see how integration collects basic information how to retrieve connection

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:10:24
+
+

*Thread Reply:* Just thinking why did it worked with Airflow 2.6.3 and openlineage-airflow package, seems that it was accessing Trino differently

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mantas Mykolaitis + (mantasmy@wix.com) +
+
2024-04-03 08:10:40
+
+

*Thread Reply:* But anyways, will try to look more into it. Thanks for tips!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Jakub Dardziński + (jakub.dardzinski@getindata.com) +
+
2024-04-03 08:12:13
+
+

*Thread Reply:* please let me know your findings, it might be some bug introduced in provider package

+ + + +
+ 👍 Mantas Mykolaitis +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:07
+
+

Looking for some help with spark and the “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.” We are not getting any openLineage data in Cloudwatch nor in sparkHistoryLogs. +(more details in thread - should I be making this into a github issue instead?)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:29
+
+

*Thread Reply:* The python code:

+ +

import sys +from awsglue.transforms import ** +from awsglue.utils import getResolvedOptions +from pyspark.context import SparkContext +from pyspark.conf import SparkConf +from awsglue.context import GlueContext +from awsglue.job import Job

+ +

conf = SparkConf() +conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")\ + .set("spark.jars.packages","io.openlineage:openlineage_spark:1.10.2")\ + .set("spark.openlineage.version","v1")\ + .set("spark.openlineage.namespace","OL_EXAMPLE_DN")\ + .set("spark.openlineage.transport.type","console") +## @params: [JOB_NAME] +args = getResolvedOptions(sys.argv, ['JOB_NAME'])

+ +

sc = SparkContext.getOrCreate(conf=conf) +glueContext = GlueContext(sc) +spark = glueContext.spark_session +job = Job(glueContext) +job.init(args['JOB_NAME'], args) +df = spark.read.format("csv").option("header","true").load("<s3-folder-path>") +df.write.format("csv").option("header","true").save("<s3-folder-path>",mode='overwrite') +job.commit()

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 08:29:32
+
+

*Thread Reply:* Nothing appears in cloudwatch, or in the sparkHistoryLogs. Here's the jr_runid file from sparkHistoryLogs - it shows that the work was done, but nothing about openlineage or where the spark session was stopped before OL could do anything: +{ + "Event": "SparkListenerApplicationStart", + "App Name": "nativespark-check_python_-jr_<jrid>", + "App ID": "spark-application-0", + "Timestamp": 0, + "User": "spark" +} +{ + "Event": "org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart", + "executionId": 0, + "description": "load at NativeMethodAccessorImpl.java:0", + "details": "org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:498)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:282)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\npy4j.ClientServerConnection.run(ClientServerConnection.java:106)\njava.lang.Thread.run(Thread.java:750)", + "physicalPlanDescription": "== Parsed Logical Plan ==\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#7, None)) > 0)\n +- Project [value#0 AS value#7]\n +- Project [value#0]\n +- Relation [value#0] text\n\n== Analyzed Logical Plan ==\nvalue: string\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#7, None)) > 0)\n +- Project [value#0 AS value#7]\n +- Project [value#0]\n +- Relation [value#0] text\n\n== Optimized Logical Plan ==\nGlobalLimit 1\n+- LocalLimit 1\n +- Filter (length(trim(value#0, None)) > 0)\n +- Relation [value#0] text\n\n== Physical Plan ==\nCollectLimit 1\n+- **(1) Filter (length(trim(value#0, None)) > 0)\n +- FileScan text [value#0] Batched: false, DataFilters: [(length(trim(value#0, None)) > 0)], Format: Text, Location: InMemoryFileIndex(1 paths)[<s3-csv-file>], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>\n", + "sparkPlanInfo": { + "nodeName": "CollectLimit", + "simpleString": "CollectLimit 1", + "children": [ + { + "nodeName": "WholeStageCodegen (1)", + "simpleString": "WholeStageCodegen (1)", + "children": [ + { + "nodeName": "Filter", + "simpleString": "Filter (length(trim(value#0, None)) > 0)", + "children": [ + { + "nodeName": "InputAdapter", + "simpleString": "InputAdapter", + "children": [ + { + "nodeName": "Scan text ", + "simpleString": "FileScan text [value#0] Batched: false, DataFilters: [(length(trim(value#0, None)) > 0)], Format: Text, Location: InMemoryFileIndex(1 paths)[<s3-csv-file>], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>", + "children": [], + "metadata": { + "Location": "InMemoryFileIndex(1 paths)[<s3-csv-file>]", + "ReadSchema": "struct<value:string>", + "Format": "Text", + "Batched": "false", + "PartitionFilters": "[]", + "PushedFilters": "[]", + "DataFilters": "[(length(trim(value#0, None)) > 0)]" + }, + "metrics": [ + { + "name": "number of output rows from cache", + "accumulatorId": 14, + "metricType": "sum" + }, + { + "name": "number of files read", + "accumulatorId": 15, + "metricType": "sum" + }, + { + "name": "metadata time", + "accumulatorId": 16, + "metricType": "timing" + }, + { + "name": "size of files read", + "accumulatorId": 17, + "metricType": "size" + }, + { + "name": "max size of file split", + "accumulatorId": 18, + "metricType": "size" + }, + { + "name": "number of output rows", + "accumulatorId": 13, + "metricType": "sum" + } + ] + } + ], + "metadata": {}, + "metrics": [] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "number of output rows", + "accumulatorId": 12, + "metricType": "sum" + } + ] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "duration", + "accumulatorId": 11, + "metricType": "timing" + } + ] + } + ], + "metadata": {}, + "metrics": [ + { + "name": "shuffle records written", + "accumulatorId": 9, + "metricType": "sum" + }, + { + "name": "shuffle write time", + "accumulatorId": 10, + "metricType": "nsTiming" + }, + { + "name": "records read", + "accumulatorId": 7, + "metricType": "sum" + }, + { + "name": "local bytes read", + "accumulatorId": 5, + "metricType": "size" + }, + { + "name": "fetch wait time", + "accumulatorId": 6, + "metricType": "timing" + }, + { + "name": "remote bytes read", + "accumulatorId": 3, + "metricType": "size" + }, + { + "name": "local blocks read", + "accumulatorId": 2, + "metricType": "sum" + }, + { + "name": "remote blocks read", + "accumulatorId": 1, + "metricType": "sum" + }, + { + "name": "remote bytes read to disk", + "accumulatorId": 4, + "metricType": "size" + }, + { + "name": "shuffle bytes written", + "accumulatorId": 8, + "metricType": "size" + } + ] + }, + "time": 0, + "modifiedConfigs": {} +} +{ + "Event": "SparkListenerApplicationEnd", + "Timestamp": 0 +}

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:06:04
+
+

*Thread Reply:* I think this is related to job.commit() that probably stops context underneath

+ + + +
+ ✅ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:06:33
+
+

*Thread Reply:* This is probably the same bug: https://github.com/OpenLineage/OpenLineage/issues/2513 but manifests differently

+
+ + + + + + + +
+
Labels
+ integration/spark +
+ +
+
Comments
+ 14 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-04-03 09:45:59
+
+

*Thread Reply:* can you try without the job.commit()?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 09:54:39
+
+

*Thread Reply:* Sure!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 09:56:31
+
+

*Thread Reply:* BTW it makes sense that if the spark listener is disabled, that the openlineage integration shouldn’t even try. (If we removed that line, it doesn’t feel like the integration would actually work….)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 09:57:51
+
+

*Thread Reply:* you mean removing this? +conf.set("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")\ +if you don't set it, none of our code is actually being loaded

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Rodrigo Maia + (rodrigo.maia@manta.io) +
+
2024-04-03 09:59:25
+
+

*Thread Reply:* i meant, removing the job.init and job.commit for testing purposes. glue should work without that,

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 12:47:03
+
+

*Thread Reply:* We removed job.commit, same error. Should we also remove job.init?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 12:48:06
+
+

*Thread Reply:* Won’t removing this change the functionality? +job.init(args[‘JOB_NAME’], args)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 13:22:11
+
+

*Thread Reply:* interesting - maybe something else stops the job explicitely underneath on Glue?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-03 13:38:02
+
+

*Thread Reply:* Will have a look.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
DEEVITH NAGRAJ + (deevithraj435@gmail.com) +
+
2024-04-03 23:09:10
+
+

*Thread Reply:* Hi all, +I'm working with Sheeri on this, so couple of queries,

+ +
  1. tried to set("spark.openlineage.transport.location","/sample.txt>") then the job succeeds but no output in the sample.txt file. (however there are some files created in /sparkHistoryLogs and /sparkHistoryLogs/output), I dont see the OL output file here.
    +2.set("spark.openlineage.transport.type","console") the job fails with “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.”

  2. if we are using http as transport.type, then can we use basic auth instead of api_key?

  3. +
+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 05:32:05
+
+

*Thread Reply:* > 3. if we are using http as transport.type, then can we use basic auth instead of api_key? +Would be good to add that to HttpTransport 🙂

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-04 05:33:16
+
+

*Thread Reply:* > 1. tried to set("spark.openlineage.transport.location","<|s3:<s3bucket>/sample.txt>") then the job succeeds but no output in the sample.txt file. (however there are some files created in /sparkHistoryLogs and /sparkHistoryLogs/output), I dont see the OL output file here.
+Yeah, FileTransport does not work with object storage - it needs to be regular filesystem. I don't know if we can make it work without pulling a lot of dependencies and making it significantly more complex - but of course we'd like to see such contribution

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 08:11:44
+
+

*Thread Reply:* @DEEVITH NAGRAJ yes, that’s why the PoC is to have the sparklineage use the transport type of “console” - we can’t save to files in S3.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 08:12:54
+
+

*Thread Reply:* @DEEVITH NAGRAJ if we can get it to work in console, and CloudWatch shows us openlineage data, then we can change the transport type to an API and set up fluentd to collect the data.

+ +

BTW yesterday another customer got it working in console, and Roderigo from this thread also saw it working in console, so we know it does work in general 😄

+ + + +
+ 🙌 DEEVITH NAGRAJ +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
DEEVITH NAGRAJ + (deevithraj435@gmail.com) +
+
2024-04-04 11:47:20
+
+

*Thread Reply:* yes Sheeri, I agree we need to get it to work in the console.I dont see anything in the cloudwatch, and the error is thrown when tried to set("spark.openlineage.transport.type","console") the job fails with “UNCLASSIFIED_ERROR; An error occurred while calling o110.load. Cannot call methods on a stopped SparkContext.”

+ +

do we need to specify scala version in .set("spark.jars.packages","io.openlineage:openlineagespark:1.10.2") like .set("spark.jars.packages","io.openlineage:openlineagespark_2.13:1.10.2")? is that causing the issue?

+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Sheeri Cabral (Collibra) + (sheeri.cabral@collibra.com) +
+
2024-04-04 14:03:37
+
+

*Thread Reply:* Awesome! We’ve got it so the job succeeds when we set the transport type to “console”. Anyone have any tips on where to find it in CloudWatch? the job itself has a dozen or so different logs and we’re clicking all of them, but maybe there’s an easier way?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:15:27
+
+

Hi everyone, I've started 2 weeks ago to implement openLineage in our solution. But I've run into some problems and quite frankly I don't understand what I'm doing wrong. +The situation is, we are using Azure Synapse with notebooks and we want to pick up the data lineage. I have found a lot of documentation about databricks in combination with Openlineage. But there is not much documentation with Synapse in combination with Openlineage. I've installed the newest library "openlineage-1.10.2" in the Synapse Apache Spark packages (so far so good). The next step I did was to configure the Apache Spark configuration, based on a blog I’ve found I filled in the following properties: +spark.extraListeners - io.openlineage.spark.agent.OpenLineageSparkListener +spark.openlineage.host – <https://functionapp.azurewebsites.net/api/function> +spark.openlineage.namespace – synapse name +spark.openlineage.url.param.code – XXXX +spark.openlineage.version – 1

+ +

I’m not sure if the namespace is good, I think it's the name of synapse? But the moment I want to run the Synapse notebook (creating a simple dataframe) it shows me an error

+ +

Py4JJavaError Traceback (most recent call last) Cell In [5], line 1 ----&gt; 1 df = spark.read.load('<abfss://bronsedomein1@xxxxxxxx.dfs.core.windows.net/adventureworks/vendors.parquet>', format='parquet') **2** display(df) +Py4JJavaError: An error occurred while calling o4060.load. +: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.

+ +

I can’t figure out what I’m doing wrong, does somebody have a clue?

+ +

Thanks, +Mark

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:35:46
+
+

*Thread Reply:* this error seems unrelated to openlineage to me, can you try removing all the openlineage related properties from the config and testing this out just to rule that out?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:39:30
+
+

*Thread Reply:* Hey Harel,

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 10:40:49
+
+

*Thread Reply:* Yes I removed all the related openlineage properties. And (ofcourse 😉 ) it's working fine. But the moment I fill in the Properties as mentiond above, it gives me the error.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:45:41
+
+

*Thread Reply:* thanks for checking, wanted to make sure. 🙂

+ + + +
+ 👍 Mark de Groot +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Harel Shein + (harel.shein@gmail.com) +
+
2024-04-03 10:48:03
+
+

*Thread Reply:* can you try only setting +spark.extraListeners = io.openlineage.spark.agent.OpenLineageSparkListener +spark.jars.packages = io.openlineage:openlineage_spark_2.12:1.10.2 +spark.openlineage.transport.type = console +?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-03 12:01:20
+
+

*Thread Reply:* @Mark de Groot are you stopping the job using spark.stop() or similar command?

+ + + +
+ 👍 Mark de Groot +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 12:18:21
+
+

*Thread Reply:* So when i Run the default value in Synapse

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Mark de Groot + (mdegroot@ilionx.com) +
+
2024-04-03 12:19:49
+
+

*Thread Reply:* Everything is working fine, but when I use the following properties +I'm getting an error, when trying e.q to create a Dataframe.

+ +
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 11:23:31
+
+

@channel + Accenture+Confluent's Open Standards for Data Lineage roundtable is happening on April 25th, featuring: +• Kai Waehner (Confluent) +• @Mandy Chessell (Egeria) +• @Julien Le Dem (OpenLineage) +• @Jens Pfau (Google Cloud) +• @Ernie Ostic (Manta/IBM) +• @Sheeri Cabral (Collibra) +• Austin Kronz (Atlan) +• @Luigi Scorzato (moderator, Accenture) +Not to be missed! Register at the link.

+
+
events.confluent.io
+ + + + + + + + + + + + + + + + + +
+ + + +
+ 🔥 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Bassim EL Baroudi + (bassim.elbaroudi@gmail.com) +
+
2024-04-03 12:58:12
+
+

Hi everyone, +I'm trying to launch a spark job with integration with openlineage. The version of spark is 3.5.0. +The configuration used:

+ +

spark.jars.packages=io.openlineage:openlineage-spark_2.12:1.10.2 +spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener +spark.openlineage.transport.url=http://marquez.dcp.svc.cluster.local:8087 +spark.openlineage.namespace=pyspark +spark.openlineage.transport.type=http +spark.openlineage.facets.disabled="[spark.logicalPlan;]" +spark.openlineage.debugFacet=enabled

+ +

the spark job exits with the following error: +java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +24/04/03 13:23:39 INFO SparkContext: SparkContext is stopping with exitCode 0. +24/04/03 13:23:39 ERROR Utils: throw uncaught fatal error in thread spark-listener-group-shared +java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) +Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: 'org.apache.spark.sql.SQLContext org.apache.spark.sql.execution.SparkPlan.sqlContext()' + at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:32) + at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$4(OpenLineageSparkListener.java:172) + at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220) + at java.base/java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2760) + at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:171) + at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:125) + at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:117) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) + at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) + at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) + at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) + at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) + at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) + at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) + at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) + at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) + at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) + at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:29:34
+
+

*Thread Reply:* Hey @Bassim EL Baroudi, what environnment are you running the Spark job? Is this some real-life production job or are you able to provide a code snippet which reproduces it?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 03:31:29
+
+

*Thread Reply:* Do you get any OpenLineage events like START events and see this exception at the end of job or does it occur at the begining resulting in no events emitted?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-03 16:16:41
+
+

@channel +This month’s TSC meeting is next Wednesday the 10th at 9:30am PT. +On the tentative agenda (additional items TBA): +• announcements + ◦ upcoming events including the Accenture+Confluent roundtable on 4/25 +• recent release highlights +• discussion items + ◦ supporting job-to-job, as opposed to job-dataset-job, dependencies in the spec + ◦ improving naming +• open discussion +More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? Reply here or DM me to be added to the agenda.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+ + + +
+ 👍 Paweł Leszczyński, Sheeri Cabral (Collibra), Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-03 22:19:15
+
+

Hi! How can i pass multiple kafka brokers when using with Flink? It appears marquez doesnt allow to have namespaces with commas.

+ +

namespace 'roker1,broker2,broker3' must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), at (@), plus (+), dashes (-), colons (:), equals (=), semicolons (;), slashes (/) or dots (.) with a maximum length of 1024 characters.

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-04 02:36:19
+
+

*Thread Reply:* Kafka dataset naming already has an open issue -> https://github.com/OpenLineage/OpenLineage/issues/560

+ +

I think the problem you raised deserves a separate one. Feel free to create it. I. think we can still modify broker separator to semicolon.

+
+ + + + + + + +
+
Comments
+ 1 +
+ + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 17:46:31
+
+

FYI I've moved https://github.com/OpenLineage/OpenLineage/pull/2489 to https://github.com/OpenLineage/OpenLineage/pull/2578 - I mistakenly included a couple of merge commits upon git rebase --signoff. Hopefully the tests should pass now (there were a couple of macro templates that still reported the old arguments). Is it still in time to be squeezed inside 1.11.0? It's not super-crucial (for us at least), since we already have copied the code of those macros in our operators implementation, but since the same fix has already been merged on the Airflow side it'd be good to keep things in sync (cc @Maciej Obuchowski @Kacper Muda)

+
+ + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+ 👀 Maciej Obuchowski +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Fabio Manganiello + (fabio.manganiello@booking.com) +
+
2024-04-04 18:43:05
+
+

*Thread Reply:* The tests are passing now

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-05 01:37:57
+
+

I wanted to ask if there are any roadmap to adding more support for flink sources and sinks to openlineage for example: +• Kinesis +• Hudi +• Iceberg SQL +• Flink CDC +• Opensearch +or how one can contribute to those?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Kacper Muda + (kacper.muda@getindata.com) +
+
2024-04-05 02:48:41
+
+

*Thread Reply:* Hey, if you feel like contributing, take a look at our contributors guide 🙂

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 07:14:55
+
+

*Thread Reply:* I think most important think on Flink side is working with Flink community on implementing https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener - as this allows us to move the implementation to the dedicated connectors

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
dolfinus + (martinov_m_s_@mail.ru) +
+
2024-04-05 09:47:22
+
+

👋 Hi everyone!

+ + + +
+ 👋 Michael Robinson, Jakub Dardziński, Harel Shein, Damien Hawes +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 09:56:35
+
+

*Thread Reply:* Hello 👋

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-05 11:30:01
+
+

@channel +We released OpenLineage 1.11.3, featuring a new package to support built-in lineage in Spark extensions and a telemetry mechanism in the Spark integration, among many other additions and fixes. +Additions +• Common: add support for SCRIPT-type jobs in BigQuery #2564 @kacpermuda +• Spark: support for built-in lineage extraction #2272 @pawel-big-lebowski +• Spark/Java: add support for Micrometer metrics #2496 @mobuchowski +• Spark: add support for telemetry mechanism #2528 @mobuchowski +• Spark: support query option on table read #2556 @mobuchowski +• Spark: change SparkPropertyFacetBuilder to support recording Spark runtime #2523 @Ruihua98 +• Spec: add fileCount to dataset stat facets #2562 @dolfinus +There were also many bug fixes -- please see the release notes for details. +Thanks to all the contributors with a shout out to new contributor @dolfinus (who contributed 5 PRs to the release and already has 4 more open!) and @Maciej Obuchowski and @Jakub Dardziński for the after-hours CI fixes! +Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.11.3 +Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md +Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.10.2...1.11.3 +Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage|https://oss.sonatype.org/#nexus-search;quick~openlineage +PyPI: https://pypi.org/project/openlineage-python/

+ + + +
+ 🔥 Maciej Obuchowski, Jorge, taosheng shi, Ricardo Gaspar +
+ +
+ 🚀 Maciej Obuchowski, taosheng shi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:21:34
+
+

👋 Hi everyone!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:22:10
+
+

*Thread Reply:* This is Taosheng from GitData Labs (https://gitdata.ai/) and We are building data versioning tool for responsible AL/ML:

+ +

An Git-like version control file system for data lineage & data collaboration. +https://github.com/GitDataAI/jiaozifs

+
+
gitdata.ai
+ + + + + + + + + + + + + + + +
+
+ + + + + + + +
+
Website
+ <https://jiaozifs.com> +
+ +
+
Stars
+ 34 +
+ + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Maciej Obuchowski + (maciej.obuchowski@getindata.com) +
+
2024-04-05 12:23:38
+
+

*Thread Reply:* hello 👋

+ + + +
+ 👋 taosheng shi +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:26:56
+
+

*Thread Reply:* I came across OpenLineage on Google I would be able to contribute with our products & skills. I Was thinking maybe could start sharing some of them here, and seeing if there is something that feels like it could be interesting to co-build on/through OpenLineage and co-market together.

+ + + +
+ ❤️ Sheeri Cabral (Collibra) +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
taosheng shi + (taoshengshi01@gmail.com) +
+
2024-04-05 12:27:06
+
+

*Thread Reply:* Would somebody be open to discuss any open opportunities for us together?

+ + + +
+ 👍 Michael Robinson +
+ +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Michael Robinson + (michael.robinson@astronomer.io) +
+
2024-04-05 14:55:20
+
+

*Thread Reply:* 👋 welcome and thanks for joining!

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 03:02:10
+
+

Hi Everyone ! Wanted to implement a cross stack data lineage across Flink and Spark but it seems that Iceberg Table gets registered asdifferent datasets in both. Spark at the top Flink at the bottom. so it doesnt get added to the same DAG. In Spark, Iceberg Table gets Database added in the name. Im seeing that @Paweł Leszczyński commited Spark/Flink Unify Dataset naming from URI objects (https://github.com/OpenLineage/OpenLineage/pull/2083/files#). So not sure what could be going on

+ + +
+ + + + + + + + + +
+
+ + + + + + + + + +
+ + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 04:53:53
+
+

*Thread Reply:* Looks like this method https://github.com/OpenLineage/OpenLineage/blob/1.11.3/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L164 creates name with (tb+database)

+ +

In general, I would say we should add naming convention here -> https://openlineage.io/docs/spec/naming/ . I think db.table format is fine as we're using it for other sources.

+ +

IcebergSinkVisitor in Flink integration is does not seem to add symlink facet pointing to iceberg table with schema included. You can try extending it with dataset symlink facet as done for Spark.

+
+
openlineage.io
+ + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 06:35:59
+
+

*Thread Reply:* How do you suggest we do so? creating a PR, extending IcebergSink Visitor or do it manually through spark as in this example https://github.com/OpenLineage/workshops/blob/main/spark/dataset_symlinks.ipynb

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 07:26:35
+
+

*Thread Reply:* is there any way to create a symlink via marquez api?

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 07:26:44
+
+

*Thread Reply:* trying to figure out whats the easiest approach

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 07:44:54
+
+

*Thread Reply:* there are two possible conventions for pointing to iceberg dataset: +• its physical location +• namespace pointing to iceberg catalog, name pointing to schema+table +Flink integration uses physical location only. IcebergSinkVisitor should add additional facet - dataset symlink facet

+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Paweł Leszczyński + (pawel.leszczynski@getindata.com) +
+
2024-04-08 07:46:37
+
+

*Thread Reply:* just like spark integration is doing +here -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L86

+
+ + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+ + + + + +
+
+ + + + +
+ +
Francisco Morillo + (fmorillo@amazon.es) +
+
2024-04-08 15:01:10
+
+

*Thread Reply:* I have been testing in modifying first the event that gets emitted, but in the lineage i am seeing duplicate datasets. As the physical location for flink is also different than the one spark uses

From 4c6f2e03fdb94fab4577fc6068639b4fce289551 Mon Sep 17 00:00:00 2001 From: merobi-hub Date: Mon, 8 Apr 2024 16:17:52 -0400 Subject: [PATCH 2/2] Remove github bot channels. Signed-off-by: merobi-hub --- channel/github-discussions/index.html | 6687 ---- channel/github-notifications/index.html | 40921 ---------------------- 2 files changed, 47608 deletions(-) delete mode 100644 channel/github-discussions/index.html delete mode 100644 channel/github-notifications/index.html diff --git a/channel/github-discussions/index.html b/channel/github-discussions/index.html deleted file mode 100644 index 83e1886..0000000 --- a/channel/github-discussions/index.html +++ /dev/null @@ -1,6687 +0,0 @@ - - - - - - Slack Export - #github-discussions - - - - - -
- - - -
- - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-03 17:55:40
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-03 17:56:32
-
-

/github subscribe OpenLineage/OpenLineage discussions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-03 17:57:39
-
-

/github subscribe OpenLineage/OpenLineage discussions

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
B01UH3Z8K8V - -
-
2023-02-03 17:57:39
-
-

✅ Subscribed to OpenLineage/OpenLineage. This channel will receive notifications for issues, pulls, commits, releases, deployments, discussions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-03 21:22:08
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Mike Dillion - (mike.dillion@gmail.com) -
-
2023-02-11 18:51:36
-
-

@Mike Dillion has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
jrich - (jasonrich85@icloud.com) -
-
2023-03-10 14:52:18
-
-

@jrich has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Dev Jadhav - (dev.jadhav@loxsolution.com) -
-
2023-04-07 08:31:49
-
-

@Dev Jadhav has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 07:04:19
-
-

@Sudhar Balaji has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Yuanli Wang - (yuanliw@bu.edu) -
-
2023-05-25 20:33:51
-
-

@Yuanli Wang has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:30
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Glyn Bowden (HPE) - (glyn.bowden@hpe.com) -
-
2023-08-07 13:49:30
-
-

@Glyn Bowden (HPE) has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GTC - (tsungchih.hd@gmail.com) -
-
2023-10-21 04:59:15
-
-

@GTC has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-11-09 15:30:23
-
-

@Sheeri Cabral (Collibra) has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-11-09 15:30:58
-
-

Let me know if I did that wrong ^^^ It’s been a while since I modified someone else’s PR with my own commits.

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-16 11:40:06
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Gowthaman Chinnathambi - (gowthamancdev@gmail.com) -
-
2024-01-18 23:26:56
-
-

@Gowthaman Chinnathambi has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
jayant joshi - (itsjayantjoshi@gmail.com) -
-
2024-01-24 01:10:27
-
-

@jayant joshi has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2024-01-30 12:12:58
-
-

@tati has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ewan Lord - (ewanlord@gmail.com) -
-
2024-01-31 05:28:13
-
-

@Ewan Lord has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Josh Fischer - (josh@joshfischer.io) -
-
2024-02-03 20:13:56
-
-

@Josh Fischer has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
assia fellague - (assia.fellague@canal-plus.com) -
-
2024-02-22 11:15:06
-
-

@assia fellague has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Santiago Cobos - (santiago.cobos@ibm.com) -
-
2024-03-25 16:42:24
-
-

@Santiago Cobos has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Ray Lacerda - (ray.lacerda@live.com) -
-
2024-03-27 21:42:13
-
-

@Ray Lacerda has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ray Lacerda - (ray.lacerda@live.com) -
-
2024-03-27 21:42:20
-
-

@Ray Lacerda has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - \ No newline at end of file diff --git a/channel/github-notifications/index.html b/channel/github-notifications/index.html deleted file mode 100644 index e9331b5..0000000 --- a/channel/github-notifications/index.html +++ /dev/null @@ -1,40921 +0,0 @@ - - - - - - Slack Export - #github-notifications - - - - - -
- - - -
- - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-03 16:50:07
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Jørn Hansen - (jornhansen@gmail.com) -
-
2020-12-19 06:25:50
-
-

@Jørn Hansen has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ananth Packkildurai - (vananth22@gmail.com) -
-
2020-12-19 15:05:57
-
-

@Ananth Packkildurai has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harikiran Nayak - (hari@streamsets.com) -
-
2020-12-21 15:12:21
-
-

@Harikiran Nayak has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-12-22 19:55:31
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alagappan Sethuraman - (alagappan.als@gmail.com) -
-
2020-12-23 15:31:21
-
-

@Alagappan Sethuraman has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Laurent Paris - (laurent@datakin.com) -
-
2021-02-01 17:51:37
-
-

@Laurent Paris has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
dorzey - (dorzey@gmail.com) -
-
2021-02-02 07:05:40
-
-

@dorzey has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-02-02 19:19:25
-
-

@Alexander Gilfillan has joined the channel

- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
Girish Lingappa - (glingappa@netflix.com) -
-
2021-02-04 14:56:07
-
-

@Girish Lingappa has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Edgar Ramírez Mondragón - (edgarrm358@gmail.com) -
-
2021-02-08 03:26:38
-
-

@Edgar Ramírez Mondragón has joined the channel

- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
Xinbin Huang - (bin.huangxb@gmail.com) -
-
2021-02-11 14:09:49
-
-

@Xinbin Huang has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
aliou - (aliouamardev@gmail.com) -
-
2021-02-16 11:01:31
-
-

@aliou has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Arthur Wiedmer - (awiedmer@apple.com) -
-
2021-02-25 17:43:24
-
-

@Arthur Wiedmer has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2021-03-09 16:44:03
-
-

@Victor Shafran has joined the channel

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-04-02 11:59:30
-
-

@Michael Collado has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-04-03 21:15:06
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub (Legacy) - -
-
2021-04-16 13:41:47
-
-

GitHub app is successfully upgraded in your workspace 🎉 -To receive notifications in your private channels, you need to invite the GitHub app /invite @GitHub

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-16 14:49:42
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-16 14:51:06
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-16 17:59:27
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- rdblue, jcampbell, drewbanin, wslulciuc, mobuchowski, henneberger, mandy-chessell, collado-mike -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-22 19:25:58
-
-

[OpenLineage/OpenLineage] Pull request opened by MansurAshraf

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-23 09:54:40
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-23 09:55:12
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-26 21:21:56
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-04-28 17:12:39
-
-

[OpenLineage/OpenLineage] Pull request opened by jquintus

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-06 11:59:11
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-06 12:00:00
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-06 12:02:25
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-06 12:16:58
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-06 20:42:26
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshal Sheth - (harshal@acryl.io) -
-
2021-05-12 17:48:13
-
-

@Harshal Sheth has joined the channel

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-05-14 14:23:41
-
-

*Thread Reply:* shh don't tell anyone there's a website coming

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-17 08:40:15
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-17 20:52:01
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-17 20:52:18
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-18 08:46:42
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-18 08:47:39
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-20 09:38:09
-
-

[OpenLineage/OpenLineage] New release Release - published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-20 11:25:56
-
-

[OpenLineage/OpenLineage] Pull request closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-20 15:27:30
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-20 20:33:16
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/OpenLineage/runs/2635194679) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 07:56:39
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem, wslulciuc, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 10:49:46
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 10:59:23
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc2 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 11:04:10
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 11:05:59
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 11:07:05
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc3 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-21 12:33:47
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-25 12:02:02
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-26 07:47:32
-
-

[OpenLineage/OpenLineage] Pull request closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-26 07:47:57
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-26 11:51:30
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.0-rc4 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-27 09:53:39
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-28 04:30:21
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-28 20:08:27
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-05-28 20:08:33
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-01 11:10:54
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-02 10:20:27
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem, wslulciuc, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-04 22:05:33
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-08 17:26:30
-
-

[OpenLineage/website] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-09 02:54:03
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-09 16:14:28
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-09 18:00:27
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc5 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-10 07:40:24
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-10 07:53:45
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc6 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-10 16:47:31
-
-

[OpenLineage/website] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-10 21:32:40
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-11 12:02:28
-
-

[OpenLineage/website] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-14 12:33:26
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 10:59:14
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 11:05:51
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - -
-
Labels
- bug -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 18:49:51
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 19:00:20
-
-

[OpenLineage/OpenLineage.github.io] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 19:03:38
-
-

[OpenLineage/OpenLineage.github.io] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 19:40:28
-
-

[OpenLineage/website] is now public!

-
- - -
- - - } - - OpenLineage - (https://github.com/OpenLineage) -
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Willy Lulciuc, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 19:58:29
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 20:26:43
-
-

[OpenLineage/website] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/website/runs/2844505752) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-16 22:07:41
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 04:18:34
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 05:50:20
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 13:56:41
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 13:59:09
-
-

[OpenLineage/website] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 14:26:56
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 14:28:14
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 16:22:44
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 18:07:08
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 18:20:34
-
-

[OpenLineage/OpenLineage.github.io] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 18:53:00
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- julienledem -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-17 21:16:47
-
-

[OpenLineage/OpenLineage.github.io] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-18 15:16:09
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-18 15:18:03
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-18 17:12:11
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-21 09:39:30
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-21 17:28:08
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-22 05:48:49
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-25 15:36:47
-
-

[OpenLineage/website] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-25 16:32:06
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-25 17:46:16
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-25 19:02:10
-
-

[OpenLineage/website] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-28 14:44:38
-
-

[OpenLineage/website] Issue opened by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-29 05:22:07
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-06-29 20:26:18
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-01 12:21:57
-
-

[OpenLineage/OpenLineage.github.io] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-01 14:25:45
-
-

[OpenLineage/website] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-01 14:37:04
-
-

[OpenLineage/website] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-01 14:38:23
-
-

[OpenLineage/website] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-01 14:44:34
-
-

[OpenLineage/website] Issue opened by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 10:54:17
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:49:09
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:49:13
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:50:10
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:50:14
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:50:19
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:50:24
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:50:47
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:51:05
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:59:18
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 16:59:48
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 17:14:07
-
-

[OpenLineage/OpenLineage.github.io] Pull request closed by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-02 17:24:26
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - -
-
Reviewers
- phixMe -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-07 19:59:01
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-07 22:27:50
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-07 22:28:09
-
-

[OpenLineage/website] Issue closed by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-07 22:28:11
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-07 22:36:58
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - -
-
Reviewers
- collado-mike -
- - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/website/runs/3015059078) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-08 03:13:17
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-08 20:01:04
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-08 20:28:58
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-12 12:21:28
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-12 12:30:22
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-12 12:32:26
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-12 14:38:48
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-12 20:45:05
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 05:52:02
-
-

[OpenLineage/OpenLineage] Pull request ready for review by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 06:48:11
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 11:46:07
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 12:07:23
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 12:36:29
-
-

[OpenLineage/website] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- rossturk, wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-13 14:07:21
-
-

[OpenLineage/website] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-16 14:51:03
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/OpenLineage/runs/3089117573) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-16 15:12:57
-
-

[OpenLineage/OpenLineage] Pull request closed by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-16 15:36:58
-
-

[OpenLineage/OpenLineage] Pull request ready for review by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-19 09:02:13
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/OpenLineage/runs/3104231996) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-19 10:46:18
-
-

[OpenLineage/OpenLineage] Pull request closed by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-20 08:50:02
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/OpenLineage/runs/3114158522) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-20 11:53:11
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Assignees
- OleksandrDvornik -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-20 12:03:16
-
-

[OpenLineage/OpenLineage] Pull request closed by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-21 10:39:13
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- wslulciuc, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-21 18:05:44
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-22 12:28:05
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-22 15:10:45
-
-

[OpenLineage/website] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- rossturk -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-07-22 16:26:39
-
-

*Thread Reply:* I think I may try to deploy this change at the same time as a blog post about today’s LFAI announcement

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-22 18:59:17
-
-

[OpenLineage/website] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-22 20:41:23
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-22 21:08:30
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-22 21:37:23
-
-

*Thread Reply:* Thanks. I was driving

- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-23 14:01:40
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-23 14:02:54
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 07:49:25
-
-

[OpenLineage/OpenLineage] Pull request opened by fiskus

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 11:15:45
-
-

[OpenLineage/OpenLineage] Pull request ready for review by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 13:32:02
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 14:46:17
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 14:55:04
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Labels
- bug -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 15:03:09
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 16:49:35
-
-

[OpenLineage/OpenLineage] Issue opened by wslulciuc

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-26 19:26:34
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 07:11:18
-
-

[OpenLineage/OpenLineage] Pull request merged by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 07:12:20
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 08:00:18
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - -
- - - } - - DCO: DCO - (https://github.com/OpenLineage/OpenLineage/runs/3171451768) -
- - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 08:00:33
-
-

[OpenLineage/OpenLineage] Pull request closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 10:41:54
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 11:52:52
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 18:19:19
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 18:51:14
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- mandy-chessell, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:05:50
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:17:21
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-27 19:55:17
-
-

*Thread Reply:* I reverted this :face_palm: git push origin head

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:53:34
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- mobuchowski, OleksandrDvornik, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:58:06
-
-

[OpenLineage/website] Pull request opened by rossturk

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:58:59
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 19:59:00
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 20:03:11
-
-

[OpenLineage/website] Pull request merged by rossturk

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-27 21:08:39
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- mobuchowski, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-28 10:07:57
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - -
-
Reviewers
- wslulciuc, mobuchowski -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-28 12:25:23
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-28 13:12:45
-
-

[OpenLineage/OpenLineage] Pull request closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-28 18:14:51
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-28 18:44:54
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 09:45:25
-
-

[OpenLineage/OpenLineage] Pull request ready for review by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 09:47:31
-
-

[OpenLineage/OpenLineage] Pull request merged by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 12:39:38
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 12:44:01
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 13:09:24
-
-

[OpenLineage/OpenLineage] Pull request merged by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-29 19:41:02
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 08:29:37
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 09:39:36
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 14:11:10
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 15:07:07
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 15:20:27
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 17:11:47
-
-

[OpenLineage/OpenLineage] Issue opened by knxacgcg

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-07-30 19:42:42
-
-

[OpenLineage/OpenLineage] Issue opened by knxacgcg

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-02 05:30:04
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-02 05:34:49
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-02 05:35:17
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-02 05:39:29
-
-

[OpenLineage/OpenLineage] Issue closed by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-02 12:20:24
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 10:29:59
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 10:50:17
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 11:15:16
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 11:30:26
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 11:34:53
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 11:34:53
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 11:34:54
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 12:21:46
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 12:29:35
-
-

[OpenLineage/OpenLineage] Pull request closed by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 13:32:02
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 17:58:41
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-03 17:59:12
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 05:59:21
-
-

[OpenLineage/OpenLineage] Pull request merged by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 10:27:57
-
-

[OpenLineage/OpenLineage] Pull request merged by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-08-04 10:42:12
-
-

@Oleksandr Dvornik has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 11:06:12
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 11:44:24
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 12:43:36
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joe Regensburger - (jregensburger@immuta.com) -
-
2021-08-04 13:33:45
-
-

@Joe Regensburger has joined the channel

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 14:38:35
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 18:20:53
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 20:01:21
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 20:12:17
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 20:14:17
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 22:49:20
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- mobuchowski, OleksandrDvornik, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 23:11:04
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-04 23:11:05
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 05:12:56
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 05:22:48
-
-

[OpenLineage/OpenLineage] Issue opened by fm100

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 12:04:23
-
-

[OpenLineage/OpenLineage] Pull request ready for review by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:03:59
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc7 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:04:10
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:04:18
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:04:31
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:05:58
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:11:09
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 14:45:05
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 15:04:34
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc7 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 15:12:16
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc7 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 15:16:52
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc8 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-05 15:19:52
-
-

[OpenLineage/OpenLineage] New release Release - 0.0.1-rc8 published by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 08:06:00
-
-

[OpenLineage/OpenLineage] Pull request ready for review by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 08:06:33
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 08:26:27
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 08:48:31
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 08:48:46
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 09:04:26
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 12:46:34
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-06 19:34:30
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-09 09:22:51
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-09 10:15:56
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-09 11:43:19
-
-

[OpenLineage/OpenLineage] Issue opened by mandy-chessell

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-09 16:12:10
-
-

[OpenLineage/website] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- julienledem, rossturk -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-09 19:38:47
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 07:55:28
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 13:02:18
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 13:18:11
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 13:32:15
-
-

[OpenLineage/website] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 14:01:32
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 14:10:57
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 14:11:40
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 14:13:42
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 14:16:44
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:10:51
-
-

[OpenLineage/OpenLineage] Pull request opened by collado-mike

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:25:57
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- enhancement -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:31:36
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:33:31
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:35:01
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-10 20:35:33
-
-

[OpenLineage/OpenLineage] Issue closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 11:10:18
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 11:36:48
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 11:44:36
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 13:52:10
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 14:38:57
-
-

[OpenLineage/OpenLineage] Pull request opened by collado-mike

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 14:50:46
-
-

[OpenLineage/OpenLineage] Issue closed by fm100

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 16:08:11
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 16:25:48
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 16:35:44
-
-

[OpenLineage/OpenLineage] Issue closed by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 16:35:46
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 16:43:39
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 17:40:51
-
-

[OpenLineage/OpenLineage] Pull request opened by collado-mike

-
- - - - - - - -
-
Comments
- 3 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 17:57:48
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 18:01:03
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:17:23
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:17:27
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:21:29
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:23:16
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:27:06
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 19:27:37
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:05:28
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:05:40
-
-

[OpenLineage/OpenLineage] Issue opened by mstrbac

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:05:50
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:06:04
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:06:18
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:06:31
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:06:39
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:06:50
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:07:06
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:07:17
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:07:26
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - -
-
Assignees
- OleksandrDvornik -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:07:40
-
-

[OpenLineage/OpenLineage] Issue opened by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:07:49
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:08:01
-
-

[OpenLineage/OpenLineage] Issue opened by nizardeen

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:08:13
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 20:08:22
-
-

[OpenLineage/OpenLineage] Issue opened by pomdtr

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-11 21:00:24
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 06:20:57
-
-

[OpenLineage/OpenLineage] Issue opened by OleksandrDvornik

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 12:16:02
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 12:58:26
-
-

[OpenLineage/OpenLineage] Issue closed by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 15:46:56
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 15:52:41
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 15:58:24
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 16:11:24
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Comments
- 1 -
- -
-
Reviewers
- julienledem, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 16:48:50
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:09:38
-
-

[OpenLineage/OpenLineage] Pull request merged by julienledem

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:18:17
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:21:21
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:37:04
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:45:14
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 19:51:55
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:00:57
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:13:54
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:15:26
-
-

[OpenLineage/OpenLineage] Pull request merged by collado-mike

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:32:07
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:40:53
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 20:59:34
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-12 21:16:28
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 13:46:20
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 13:46:31
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 13:51:46
-
-

[OpenLineage/OpenLineage] Pull request closed by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 13:53:48
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 14:36:22
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 14:46:37
-
-

[OpenLineage/OpenLineage] Pull request ready for review by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 15:40:16
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 18:43:03
-
-

[OpenLineage/OpenLineage] New release Release - OpenLineage 0.1.0 published by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 18:45:37
-
-

[OpenLineage/OpenLineage] Pull request opened by julienledem

-
- - - - - - - -
-
Reviewers
- wslulciuc, collado-mike -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 18:51:26
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-13 18:58:07
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-16 11:17:08
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem, wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-16 11:30:59
-
-

[OpenLineage/OpenLineage] Pull request opened by OleksandrDvornik

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-17 08:22:09
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-17 08:22:23
-
-

[OpenLineage/OpenLineage] Issue closed by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-17 11:29:35
-
-

[OpenLineage/OpenLineage] Pull request merged by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-17 12:05:29
-
-

[OpenLineage/OpenLineage] Issue opened by mobuchowski

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-18 13:24:48
-
-

[OpenLineage/OpenLineage] Pull request opened by mobuchowski

-
- - - - - - - -
-
Reviewers
- julienledem, wslulciuc -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-18 14:16:51
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-18 14:22:13
-
-

[OpenLineage/OpenLineage] Issue opened by julienledem

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-18 17:38:09
-
-

[OpenLineage/OpenLineage] Pull request opened by wslulciuc

-
- - - - - - - -
-
Reviewers
- julienledem -
- - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-08-18 20:04:37
-
-

[OpenLineage/OpenLineage] Pull request merged by wslulciuc

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-09-03 23:41:38
-
-

*Thread Reply:* If anyone has any more feedback on the website, feel free to add it to https://github.com/OpenLineage/website/issues 🙂

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-01 15:17:07
-
-

*Thread Reply:* @Julien Le Dem this ones ready for 👀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-01 15:26:33
-
-

*Thread Reply:* @Michael Collado I’ve also added you as a reviewer on the PR to get your thoughts

- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2021-11-15 12:30:14
-
-

[OpenLineage/metrics] is now public!

-
- - -
- - - } - - OpenLineage - (https://github.com/OpenLineage) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2022-07-14 12:26:23
-
-

[OpenLineage/docs] is now public!

-
- - -
- - - } - - OpenLineage - (https://github.com/OpenLineage) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - -
- -
GitHub - -
-
2023-11-20 16:40:42
-
-

[OpenLineage/slack-archives] is now public!

-
- - -
- - - } - - OpenLineage - (https://github.com/OpenLineage) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- - - - \ No newline at end of file