|
How to Achieve High Performance with Linked Servers
By Kalman Toth, M.Phil. Physics, M.Phil. Computing Science, MCDBA, MCITP
February 10, 2009
Linked servers may be few feet apart in the same computer room,
or thousands of miles away, one in New York, the other in London.
Communications with a linked server are significantly slower than communications with a database
on the same server. The reason: instead of the motherboard and
native Windows communications, the information exchange has to
go through network boxes, security authentication and additional layers of communications
software.
You can join tables in different databases on different servers: cross database, cross server query.
Here is an example:
SELECT a.*,
b.*
FROM TableA a
INNER JOIN DATASRV888.accounting.dbo.TableB b
ON a.ColumnP = b.ColumnQ
For complex queries the performance may be unpredictable. That depends
how the query optimizer prepares the execution plan. SQL Server
may copy over the entire TableB before executing the join. If
the table is small, and the communications link is fast, it is
not an issue. However for large tables and/or slow links, it is
a performance killer. On a typical platform, to copy 5 GB of data
file from the linked server to another on the Windows level may
take 10 minutes. To copy over a table 5 GB from database to database
may take much longer during normal operations. Here is a simple test with small tables and linked servers on same physical server or connected with few feet of LAN cable through a network switch:
-- JOIN tables in same database
USE AdventureWorks;
DECLARE @StartTime datetime
DBCC DROPCLEANBUFFERS
SET @StartTime = getdate()
SELECT SlsOrds = count(*)
FROM Sales.SalesOrderHeader soh -- 31K
INNER JOIN Sales.SalesOrderDetail sod -- 121K
on soh.SalesOrderID = sod.SalesOrderID
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 50 msec (avg)
-- JOIN tables in different databases on same server instance
DECLARE @StartTime datetime
DBCC DROPCLEANBUFFERS
SET @StartTime = getdate()
SELECT SlsOrds = count(*)
FROM Sales.SalesOrderHeader soh
INNER JOIN CopyALPHAOfAdventureWorks.Sales.SalesOrderDetail sod --121K
on soh.SalesOrderID = sod.SalesOrderID
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 120 msec (avg)
Related msdn forum link:
http://social.msdn.microsoft.com/Forums/en/transactsql/thread/250dac80-f35c-46ea-9bf6-2489e055280d
-- JOIN tables in different databases on different server instances (linked server)
DECLARE @StartTime datetime
-- DBCC DROPCLEANBUFFERS -- EXECUTE it on DELLSTAR\ALPHA
DBCC DROPCLEANBUFFERS
SET @StartTime = getdate()
SELECT SlsOrds = count(*)
FROM Sales.SalesOrderHeader soh
INNER JOIN [DELLSTAR\ALPHA].AdventureWorks.Sales.SalesOrderDetail sod --121K
on soh.SalesOrderID = sod.SalesOrderID
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 500 msec (avg)
-- JOIN tables in different databases on different servers (linked server)
DECLARE @StartTime datetime
-- DBCC DROPCLEANBUFFERS – EXECUTE it on GATESTAR
DBCC DROPCLEANBUFFERS
SET @StartTime = getdate()
SELECT SlsOrds = count(*)
FROM Sales.SalesOrderHeader soh
INNER JOIN [GATESTAR].AdventureWorks.Sales.SalesOrderDetail sod --121K
on soh.SalesOrderID = sod.SalesOrderID
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 400 msec (avg)
-- Execute stored procedure on linked server (Remote Procedure Call)
-- Linked server must be configured for RPC
-- Use Linked Server properties to set RPC /RPC OUT true
/* Create stored procedure on linked server GATESTAR
CREATE PROCEDURE sprocSalesOrderCount
AS
SELECT SlsOrds = count(*)
FROM AdventureWorks.Sales.SalesOrderHeader soh
INNER JOIN AdventureWorks.Sales.SalesOrderDetail sod --121K
on soh.SalesOrderID = sod.SalesOrderID
GO
*/
DECLARE @StartTime datetime
-- Execute DBCC DROPCLEANBUFFERS on GATESTAR
DBCC DROPCLEANBUFFERS
SET @StartTime = getdate()
EXECUTE [GATESTAR].AdventureWorks.dbo.sprocSalesOrderCount
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 38 msec (avg)
The linked server performance on the same server is the slowest probably due to disk contention. The linked server performance 8 times slower for a cross database, cross server query.
There is another aspect to linked server performance. Even if the 38 msec RPC execution or the 400 msec cross server query execution time is acceptable, due to network delay, the response occasionally can increase to 20, 000 msec, for example, in other words subject to unpredictability.
There are several ways you can take control back.
First, you can build and execute a stored proc on the linked
server. That is fast. Example:
EXEC DATASRV888.accounting.dbo.WeeklyAccountingSummary
Second, you can pull the data you need from the linked server
into a temporary or staging table. After that you operate with
the data on same server at the usual speed. The advantage of this
approach is that you can execute this step individually and get
timing on the transfer. Example:
SELECT *
INTO #TableB
FROM DATASRV888.accounting.dbo.TableB
WHERE ColumnR = '2010'
Third, you can "push" the data from the linked server instead of "pulling" it. The advantage of this approach is that the data partitioning takes place on the linked server efficiently, and only the needed data is moved. One way to push the data is a stored procedure on the linked server which is called from the "local" server.
The discussion above assumed high speed linking (1 GB/min) between the servers. With slow links, you have to be extremely careful. Basically, you just want to pull a single row with a remote procedure call or with OPENQUERY. If you need some significant data transfer, the best to schedule it as night process or physical transfer on CD, DVD or external disk drive.
DECLARE @StartTime datetime;
DBCC DROPCLEANBUFFERS;
-- Execute DBCC DROPCLEANBUFFERS on GATESTAR
SET @StartTime = getdate();
SELECT * FROM OPENQUERY( GATESTAR,
'SELECT SlsOrds = count(*)
FROM AdventureWorks.Sales.SalesOrderHeader soh -- 31K
INNER JOIN AdventureWorks.Sales.SalesOrderDetail sod -- 121K
on soh.SalesOrderID = sod.SalesOrderID;
')
SELECT DurationMsec = datediff(ms, @StartTime, getdate())
GO
-- 121313
-- 40 msec (avg)
The OPENQUERY above is even faster than execution on the local DELLSTAR server since the query is executed on the linked server and only the few bytes of result is transferred on the network. Nevertheless it is still subject to unpredictability due to network performance.
The following article states that the PULL method (if available) is much faster than the PUSH method:
Linked servers and performance impact: Direction matters!
A WORKAROUND suggested by Paul Nielsen (http://www.sqlserverbible.com/) : don't use linked servers, instead push the task into the application layer because the application can connect to both servers at the usual (high) speed.
|