Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running pytest with local spark session #1152

Open
rotemb-cye opened this issue Mar 21, 2024 · 5 comments
Open

Running pytest with local spark session #1152

rotemb-cye opened this issue Mar 21, 2024 · 5 comments

Comments

@rotemb-cye
Copy link

rotemb-cye commented Mar 21, 2024

Hey,

I am trying to run pytest on my local PC, when databricks extension is installed.
I am trying to create local spark session:


def get_spark_session():
    spark = (
        SparkSession.builder.master("local[*]")
        .appName("local-tests")
        .config("spark.driver.bindAddress", "127.0.0.1")
        .getOrCreate()
    )
    return spark


@pytest.mark.etl
@pytest.fixture(scope="session")
def spark_session():
    spark = get_spark_session()
    yield spark
    spark.stop()

and I get the following error:
RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.

How to solve it? I want to be able to run my pytest when being offline

TNXXX

@htuomola
Copy link

htuomola commented May 3, 2024

Hi, this is the same exact issue that we have been struggling with. It seems that installing databricks-connect modifies installed pyspark package and adds throwing this error to the code. I'm also interested in finding a workaround for this because in the current state it basically blocks using Databricks Connect.

@benoitLebreton-perso
Copy link

Hello I managed to get my local spark session working by the following VSCode command palette

image

In fact, even uninstalling the extension was not working.

@odimko
Copy link

odimko commented Aug 27, 2024

thanks for your solution @benoitLebreton-perso! Do you know how to fix the issue when you run pytest from a command line?

@bestekov
Copy link

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

@benoitLebreton-perso
Copy link

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

I uninstalled databricks-connect. I work in local environnement with a local pyspark session. I work with databricks spark session only on notebooks now and I sync my local code with my repos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants