Introduction
As Microsoft Fabric continues to gain popularity, many organizations are either migrating their workloads to Fabric or already running production systems on it. With this shift, a common real-world requirement emerges:
In many cases, mirroring the entire SQL database is not feasible or not even required. Instead, the goal is simply to load all table data into a Lakehouse so it can be used for analytics, reporting, or downstream processing.
The first solution that usually comes to mind is Fabric Pipelines with Copy Activity. While this works well, there’s another, much simpler and more flexible approach—using Fabric Notebooks with PySpark.
If you’re already familiar with this approach, great. If not, don’t worry. In this blog post, I’ll walk you through the entire process step by step.
High-Level Approach
Here’s what we’ll do:
➤ Create a SQL table that stores the list of all user tables in the database
➤ Create (or use an existing) Lakehouse with schema support enabled
➤ Use a PySpark notebook to dynamically read each SQL table and load it into the Lakehouse
Step 1: Create a Table to Store SQL Table Metadata
First, we’ll create a helper table that holds the schema and table names of all user tables in the database.
DROP TABLE IF EXISTS MoveToLakeHouse;
CREATE TABLE MoveToLakeHouse
(
ID INT IDENTITY(1,1) NOT NULL,
SchemaName VARCHAR(50),
TableName VARCHAR(150)
)
Instead of manually inserting table names (which can be tedious in large databases), we can populate this table using system catalog views.
INSERT INTO MoveToLakeHouse
(SchemaName ,TableName)
SELECT
S.Name as SchemaName,
T.Name as TableName
FROM SYS.Tables T
INNER JOIN SYS.Schemas S
ON T.SCHEMA_Id = S.Schema_ID
WHERE Type_desc = 'USER_Table'
Once this query is executed, the MoveToLakeHouse table will contain one row for every user table in the database, along with its schema.
Step 2: Create the Lakehouse
If you already have a Lakehouse, you can skip this step. If not, create a new Lakehouse in Fabric, using my other blog post,Create and Explore a Lakehouse in Microsoft Fabric where it is explained in detail.
Step 3: PySpark Notebook to Move Data
Now comes the core part of the solution, which is pyspark code. so here i will go to by line by line of the code and at end will add the complete code for your convinience.
Import Required Libraries
import com.microsoft.sqlserver.jdbc.spark
from pyspark.sql.functions import col, lit,concat
In November, Fabric introduced a new Spark SQL connector (currently in preview): com.microsoft.sqlserver.jdbc.spark
This connector is a game-changer—it allows Spark to directly read tables from Azure SQL Database or Fabric SQL endpoints without complex JDBC boilerplate. For more details, you can refer to Microsoft’s official documentation.
Configure the SQL Connection
Set up the JDBC URL for your SQL database. In this example, we’re using a Fabric SQL endpoint
url = 'jdbc:sqlserver://<server>:<port>;database=<database>'
url = 'jdbc:sqlserver://ti5uq43a7asurbfgo3qkr65fcy-gq3os2f6y2eereagwkps7awaoa.database.fabric.microsoft.com:1433;database=SQLDatabase-2d6be67e-b05c-4c87-be60-4c77d30e3ab2'
Read the Metadata Table
Next, read the MoveToLakeHouse table into a DataFrame and bring the results to the driver node using the collect() method and run it to print the list of tables which are gonna migrate to lake house.
df= spark.read.option("url", url).mssql("MoveToLakeHouse")
df_collect = df.select(col("ID"),col("SchemaName"),concat(col("SchemaName"),lit("."),col("TableName")).alias("TableName")).collect()
print(df_collect)
Loop Through Tables and Load Them into the Lakehouse
Now we iterate through each row and process the tables one by one.
What’s happening here:
➤ We loop through each table from the metadata list
➤ The SQL table is read using the Spark SQL connector
➤ Data is written to the Lakehouse with the same schema and table name
➤ Existing tables are overwritten to keep the process idempotent
Once the notebook finishes, refresh the Lakehouse and you’ll see all SQL tables available there with the same structure.
Conclusion:
Using a Fabric notebook with PySpark provides a clean, scalable, and highly flexible way to move all SQL tables into a Lakehouse—without relying on mirroring or complex pipeline setups. This approach works especially well when you want full control over schema handling and table naming.
Complete Code:
SQL code:
DROP TABLE IF EXISTS MoveToLakeHouse;
CREATE TABLE MoveToLakeHouse
(
ID INT IDENTITY(1,1) NOT NULL,
SchemaName VARCHAR(50),
TableName VARCHAR(150)
)
INSERT INTO MoveToLakeHouse
(SchemaName ,TableName)
SELECT
S.Name as SchemaName,
T.Name as TableName
FROM SYS.Tables T
INNER JOIN SYS.Schemas S
ON T.SCHEMA_Id = S.Schema_ID
WHERE Type_desc = 'USER_Table'
pyspark code:
import com.microsoft.sqlserver.jdbc.spark
from pyspark.sql.functions import col, lit,concat
url = "jdbc:sqlserver://ti5uq43a7asurbfgo3qkr65fcy-gq3os2f6y2eereagwkps7awaoa.database.fabric.microsoft.com:1433;database=SQLDatabase-2d6be67e-b05c-4c87-be60-4c77d30e3ab2"
df= spark.read.option("url", url).mssql("MoveToLakeHouse")
df_collect = df.select(col("ID"),col("SchemaName"),concat(col("SchemaName"),lit("."),col("TableName")).alias("TableName")).collect()
print(df_collect)
for row in df_collect:
TableName = row.TableName
SchemaName = row.SchemaName
spark.sql("CREATE SCHEMA IF NOT EXISTS {0}".format(SchemaName))
table_DF = spark.read.option("url",url).mssql(TableName)
table_DF.write.mode("overwrite").saveAsTable(TableName)
print(f"{TableName} --> Saved to LakeHouse With Name --> {TableName}")