Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh #48937

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

philwalk
Copy link
Contributor

@philwalk philwalk commented Nov 22, 2024

What changes were proposed in this pull request?

The last action in bin/load-spark-env.sh performs a test to determine whether running in a terminal or not, and whether stdin is reading from a pipe. A more portable test is needed.

Why are the changes needed?

The current approach relies on ps with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in both cygwin and msys2 (and by extension, in all of the variations of git-for-windows). It doesn't print an error message, but fails to detect a terminal session in Linux and Osx/Darwin homebrew (always thinks STDIN is a pipe).

Here's what the problem looks like in a cygwin64 session (with set -x just ahead of the section of interest):

If called directly:

$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'

Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:

$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]

Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:

# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]

In #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024, the same failure occurs (it doesn't recognize terminal environments).

Does this PR introduce any user-facing change?

This is a proposed bug fix, and, other than fixing the bug, should be invisible to users.

How was this patch tested?

The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.


- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.

#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi

The output of the manual test in all 5 tested environments.

philwalk@quadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#

Was this patch authored or co-authored using generative AI tooling?

No

@philwalk
Copy link
Contributor Author

Not sure what "workflow run detection" is, or how to correct it.
The output of git remote -vv is:

# git remote -vv
origin  https://github.com/philwalk/spark.git (fetch)
origin  https://github.com/philwalk/spark.git (push)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @philwalk .

Do you enable GitHub Action in your repository?

There is a community documentation about Pull Request. Please check the section, Pull Request.

@philwalk
Copy link
Contributor Author

I enabled Build and Test on my forked repo.

@HyukjinKwon
Copy link
Member

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html

@pan3793
Copy link
Member

pan3793 commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?

@philwalk
Copy link
Contributor Author

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html
I requested a JIRA account, not yet created ...

@dongjoon-hyun
Copy link
Member

Got it. I approved it a few minutes ago, @philwalk .

Screenshot 2024-11-25 at 09 39 33

@philwalk
Copy link
Contributor Author

jira bug report

@philwalk
Copy link
Contributor Author

philwalk commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?
Not at this time ..
It seems that hive needs work to be compatible with cygwin or msys2 bash shell environments. There are a number of problems and I'm not able to work on it at this time.

By contrast, spark-shell starts right up (after applying this PR):

$ bin/spark-shell
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/

Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 21.0.4)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context Web UI available at http://d5:4040
Spark context available as 'sc' (master = local[*], app id = local-1732575351925).
Spark session available as 'spark'.

scala>

@HyukjinKwon HyukjinKwon changed the title A more portable terminal / pipe test needed for bin/load-spark-env.sh [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants