Move Azure SQL Tables to Fabric Lakehouse Using PySpark (Without Mirroring)

SAS
0
Move Azure SQL Tables to Fabric Lakehouse Using PySpark (Without Mirroring)

Introduction

As Microsoft Fabric continues to gain popularity, many organizations are either migrating their workloads to Fabric or already running production systems on it. With this shift, a common real-world requirement emerges:

How do we move all SQL tables from an Azure SQL Database or Fabric SQL endpoint into a Lakehouse—while keeping the same table names and schema structure—without relying on database mirroring?

In many cases, mirroring the entire SQL database is not feasible or not even required. Instead, the goal is simply to load all table data into a Lakehouse so it can be used for analytics, reporting, or downstream processing.

The first solution that usually comes to mind is Fabric Pipelines with Copy Activity. While this works well, there’s another, much simpler and more flexible approach—using Fabric Notebooks with PySpark.

If you’re already familiar with this approach, great. If not, don’t worry. In this blog post, I’ll walk you through the entire process step by step.

High-Level Approach

Here’s what we’ll do:
Create a SQL table that stores the list of all user tables in the database
Create (or use an existing) Lakehouse with schema support enabled
Use a PySpark notebook to dynamically read each SQL table and load it into the Lakehouse

Step 1: Create a Table to Store SQL Table Metadata

First, we’ll create a helper table that holds the schema and table names of all user tables in the database.

DROP TABLE IF EXISTS MoveToLakeHouse;
CREATE TABLE MoveToLakeHouse
(
    ID INT IDENTITY(1,1) NOT NULL,
    SchemaName VARCHAR(50),
    TableName VARCHAR(150)
)

Instead of manually inserting table names (which can be tedious in large databases), we can populate this table using system catalog views.

INSERT INTO MoveToLakeHouse
(SchemaName ,TableName)

SELECT 
    S.Name as SchemaName,
    T.Name as TableName

 FROM SYS.Tables T
INNER JOIN SYS.Schemas S
    ON T.SCHEMA_Id = S.Schema_ID
WHERE Type_desc = 'USER_Table'

Once this query is executed, the MoveToLakeHouse table will contain one row for every user table in the database, along with its schema.

Step 2: Create the Lakehouse

If you already have a Lakehouse, you can skip this step. If not, create a new Lakehouse in Fabric, using my other blog post,Create and Explore a Lakehouse in Microsoft Fabric where it is explained in detail.

Step 3: PySpark Notebook to Move Data

Now comes the core part of the solution, which is pyspark code. so here i will go to by line by line of the code and at end will add the complete code for your convinience.

Import Required Libraries

import com.microsoft.sqlserver.jdbc.spark
from pyspark.sql.functions import col, lit,concat

In November, Fabric introduced a new Spark SQL connector (currently in preview): com.microsoft.sqlserver.jdbc.spark

This connector is a game-changer—it allows Spark to directly read tables from Azure SQL Database or Fabric SQL endpoints without complex JDBC boilerplate. For more details, you can refer to Microsoft’s official documentation.

Configure the SQL Connection

Set up the JDBC URL for your SQL database. In this example, we’re using a Fabric SQL endpoint


url = 'jdbc:sqlserver://<server>:<port>;database=<database>'
url = 'jdbc:sqlserver://ti5uq43a7asurbfgo3qkr65fcy-gq3os2f6y2eereagwkps7awaoa.database.fabric.microsoft.com:1433;database=SQLDatabase-2d6be67e-b05c-4c87-be60-4c77d30e3ab2'
  

Read the Metadata Table

Next, read the MoveToLakeHouse table into a DataFrame and bring the results to the driver node using the collect() method and run it to print the list of tables which are gonna migrate to lake house.

df= spark.read.option("url", url).mssql("MoveToLakeHouse")
df_collect = df.select(col("ID"),col("SchemaName"),concat(col("SchemaName"),lit("."),col("TableName")).alias("TableName")).collect()
print(df_collect)
  

Loop Through Tables and Load Them into the Lakehouse

Now we iterate through each row and process the tables one by one.

What’s happening here:
We loop through each table from the metadata list
The SQL table is read using the Spark SQL connector
Data is written to the Lakehouse with the same schema and table name
Existing tables are overwritten to keep the process idempotent

Once the notebook finishes, refresh the Lakehouse and you’ll see all SQL tables available there with the same structure.

Conclusion:

Using a Fabric notebook with PySpark provides a clean, scalable, and highly flexible way to move all SQL tables into a Lakehouse—without relying on mirroring or complex pipeline setups. This approach works especially well when you want full control over schema handling and table naming.

Complete Code:

SQL code:

DROP TABLE IF EXISTS MoveToLakeHouse;
CREATE TABLE MoveToLakeHouse
(
    ID INT IDENTITY(1,1) NOT NULL,
    SchemaName VARCHAR(50),
    TableName VARCHAR(150)
)
INSERT INTO MoveToLakeHouse
(SchemaName ,TableName)

SELECT 
    S.Name as SchemaName,
    T.Name as TableName

 FROM SYS.Tables T
INNER JOIN SYS.Schemas S
    ON T.SCHEMA_Id = S.Schema_ID
WHERE Type_desc = 'USER_Table'


  

pyspark code:

import com.microsoft.sqlserver.jdbc.spark
from pyspark.sql.functions import col, lit,concat
url = "jdbc:sqlserver://ti5uq43a7asurbfgo3qkr65fcy-gq3os2f6y2eereagwkps7awaoa.database.fabric.microsoft.com:1433;database=SQLDatabase-2d6be67e-b05c-4c87-be60-4c77d30e3ab2"
df= spark.read.option("url", url).mssql("MoveToLakeHouse")
df_collect = df.select(col("ID"),col("SchemaName"),concat(col("SchemaName"),lit("."),col("TableName")).alias("TableName")).collect()
print(df_collect)

for row in df_collect:
    TableName = row.TableName
    SchemaName = row.SchemaName
    spark.sql("CREATE SCHEMA IF NOT EXISTS {0}".format(SchemaName))
    table_DF = spark.read.option("url",url).mssql(TableName)
    table_DF.write.mode("overwrite").saveAsTable(TableName)
    print(f"{TableName} --> Saved to LakeHouse With Name --> {TableName}")


  



Tags:

Post a Comment

0Comments

Post a Comment (0)