Replication= Bye Bye, Distribution Database!

We are in the middle of a migration and we need your help urgently! Replication is not working and the distribution database was deleted! The maintenance window is close to finish.

Hi, SQL Server Lovers!

Today I’ll talk about one critical situation that happened to me this week. Last Friday, the distribution database was accidentally deleted in one of our apps…. A team from the app was migrating from 2008 R2 to 2016 and they had this big issue:

  • Topology

Publisher and distributor on the same server with SQL Server 2016 SP1 Version

Subscriber with SQL Server 2016 SP1 Version

  • Issue:

A DBA tried to configure replication, but since there were some issues, he tried to remove the replication and start from scratch. Unfortunately, the disable part was not working, so he decided to execute this command:

PLEASE, IF YOU HAVE THIS SITUATION DO NOT DELETE THE DISTRIBUTION DATABASE!

FIX THE ISSUE THAT YOU ARE GETTING TO DISABLE THE REPLICATION!

ALTER DATABASE DISTRIBUTION SET OFFLINE;

GO

DROP DATABASE DISTRIBUTION;

GO

Which means that distribution DB was deleted… Obviously he tried to disable replication and he couldn’t:

Error:

Msg 911, Level 16, State 4, Procedure sp_MSrepl_helpsubscriberinfo, Line 120 [Batch Start Line 0]
Database ‘distribution’ does not exist. Make sure that the name is entered correctly.

  • Solution

So, here you have 2 ways:

  • Easy Way

After you delete the distribution database, the Database files remain on the same place (unless you delete them), so you should execute this script in order to create again the database:

CREATE DATABASE Distribution ON PRIMARY
( NAME = N’distribution’,

FILENAME = N’Last file path\Name.mdf’ ,

SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON
( NAME = N’distribution_log’,

FILENAME = N’Last file path\Name.ldf’ , SIZE = 8192KB , FILEGROWTH = 65536KB )
FOR ATTACH;

Then you can disable again the replication:

  • Hard Way

After you deleted the distribution database, you also removed the files from the hard drivers and the recycle bin (Yes, believe it or not, this was my situation!). So, to fix this, you can follow this steps:

  1. Create a new distribution database:

USE MASTER

GO

EXEC SP_ADDDISTRIBUTIONDB @database = N’Distribution_New’,
@data_folder = ‘C:\Temp’,  –Choose your own folder name
@data_file = N’Distribution_New.MDF’, –Choose your own file name
@data_file_size = 128,
@log_folder = ‘C:\Temp’,  –Choose your own folder name
@log_file = N’Distribution_New.LDF’,  –Choose your own file name
@log_file_size = 64,
@min_distretention = 0,
@max_distretention = 72,
@history_retention = 48,
@security_mode = 1

2) Stop SQL Server Services

3) Copy/Paste the distribution_new files and rename them as the old distribution files database names

— Copy/Paste Distribution DB files

copy_pate

–Rename Both distribution DB files

Rename

4) Start SQL Server Service

5) Execute this script to attach the database again

CREATE DATABASE Distribution
ON PRIMARY
( NAME = N’distribution’, FILENAME = N’C:\temp\Distribution.mdf’ , SIZE = 8192KB , FILEGROWTH = 65536KB )
LOG ON
( NAME = N’distribution_log’, FILENAME = N’C:\temp\Distribution.ldf’ , SIZE = 8192KB , FILEGROWTH = 65536KB )
FOR ATTACH

6) Disable replication and distribution:

— Remove the local server as a Distributor.

USE master
GO

EXEC sp_dropdistributor @no_checks = 1;

Have you ever faced this same situation?? Can you add any insights?

Happy new year, sql fans!

Advertisements

Listener multisubnets + Replication= Where is the data??

“I migrated from standalone to HA using a listener multisubnet and replication is not working. I need your help immediately!”

Hi, SQL Server lovers!

Today I’ll talk about one situation that took me an entire year to fix/workaround/make a decision. Long story short: We migrated some instances from one data center to another. The particular situation was that we used a SQL Server Always on feature with listeners multisubnets, instead of single subnet.

Topology:

  • SQL Server 2014 Enterprise edition
  • Always on environment acting as publisher + Listener multisubnet
  • Remote Distributor
  • Always on Environment acting as subscriber + Listener multisubnet

The thing was that we didn’t know that some instances were using replication too…

So we configured a replication using a listener as a publisher and another listener as a suscriber (you can follow this post to do it this way).

Everything was working properly until one day I received a call saying: “Gonzalo: Replication is not working and our customers are waiting for the information for their reports. We need your help immediately because today we have to send the information to the leadership”. (Yes, as you can see that in a database world, everything is ASAP….).

After I connected to the instance, I saw this particular issue:

Unable to query the redirected server ‘Listener name’ for original publisher ‘Original publisher’ and publisher database ‘Database Name’ to determine the name of the remote server; Error 7303, Error message ‘Error 7303, Level 16, State 1, Message: Cannot initialize the data source object of OLE DB provider “SQLNCLI11” for linked server “Linked server hash”.’. (Source: MSSQL_REPL, Error number: MSSQL_REPL21879)

(You can also face the same situation on the subscriber as well. The error is: Cannot connect to subscriber Listener name)

So, after some hours, I connected to the distributor and saw that the publisher listener was not responding to the ping (it was pointing to the old ip). I executed a IPCONFIG /FLUSHDNS (2 or 3 times) and, after that, the listener started to ping and the situation solved.

But… It happened 3 or 4 times the same week = Leadership pushing for a permanent solution.

We opened a case to Microsoft and they said that the solution was to automatize this flushdns command. Some months ago, one Microsoft engineer posted a workaround, but these are my final findings on this issue:

  • Issue:

Replication use OLEDB driver to connect, which means that MULTISUBNET FAILOVER Clause is not supported and if the DNS change on your distributor replication will fail

  • Workarounds:
  1. Automate ipconfig /flushdns                    -> For us it didn’t work
  2. Reduce TTL parameter for the DNS        -> For us it didn’t work
  3. Create another listener on the always on + set parameter RegisterAllProvidersIP to 0 + HostRecordTTL = 60 + increase time out time for the replication profile -> The thing for this is that you will have 2 dependencies for the always on role and also…. In case of a failover  you will have to wait for the replication to start = No permanent solution
  • Our decission:

Since we have more than 30 Web servers pointing to the listener, we keep RegisterAllProvidersIP = 1 to use multisubnet listener clause and use a different approach to move data (Integration Services). Yes, it is true that we don’t have the data in seconds ( We moved the data every 5 minutes) but we achieve the challenge avoiding issues.

  • Permanent solution:

Some months ago, Pedro Lopez blogged about a new release for OLEDB driver, that could support MultisubnetFailover Clause. I hope that we can apply this new version to replication soon, so we can avoid this kind of issues!

Did you face the same situation? Did you use the same approach? Please share your findings too!

 

 

Columnstores: ETL Nightmare

“We are using partitioning and columnstore indexes but after our ETL process nothing is working as expected”

Hi, SQL Server lovers!

Today, I’ll talk about one situation that happened to me this week. I was starting my day smoothly, until I received one email saying: “Gonzalo, we need your help immediately. Our interfaces are not working. It seems that something is happening on the database engine”. So, I jumped into a bridge call and found some interesting things:

  • Topology

SQL Server 2016 SP 1 + Database using partitioning + Clustered Columnstore indexes

  • Issue

After they executed an ETL process, I found that a clustered columnstore index have more than 500 rowgroups for the same partition boundary with less than 100.000 rows each.

  • Qry

Credits to Nick Neugebauer (http://www.nikoport.com/columnstore/)

You can connect to this link -> https://github.com/NikoNeugebauer/CISL/tree/master/SQL-2016  And Download and execute this File -> row_groups_details.sql to check all the details in your columnstore indexes. 

  • Solution

I executed a rebuild only on the problematic partition boundary and that fixed our issue:

Command ->

ALTER INDEX Columnstore_Index on [Schema].[TableName] REBUILD PARTITION = 4 WITH (MAXDOP=1)

  • Preventive actions

Every time that you insert data on your columnstore indexes, check row group sizes and states (compressed opened etc etc) + Delta stores (open/closed)

I hope this helps!

Please use the comments for feedback, questions or just to suggest topic for next posts!

Partitioning & Boundaries: How to fix it when you mess up

Partitioning: “I forgot to add a new boundary. I’m executing a split partition command and transaction log is full + database turn into recovery. What should I do?”

Hi, SQLServer lovers!

My first post will talk about a situation that happened to me many times. I received calls/emails from clients saying: “Gonzalo! We need your help now! We are getting performance issues in production. We found that this query is not working properly and also we received alerts that the mount point is almost full. We are using partitioning and we don’t know why this mount point is full, since we are adding data for the new fiscal year… Something is happening to the DB engine. We need to send this report ASAP to the leadership and we need your immediate assistance!!”

Who else faced the same situation? So below are the steps that you need to follow:

  • Issue:

I found that partitioning configuration didn’t have the proper boundaries:

  • Qry:

–Check rows per partition boundary

DECLARE @TableName NVARCHAR(200) = N’Tablename’

SELECT SCHEMA_NAME(o.schema_id) + ‘.’ + OBJECT_NAME(i.object_id) AS [object]
, p.partition_number AS [p#]
, fg.name AS [filegroup]
, p.rows
, au.total_pages AS pages
, CASE boundary_value_on_right WHEN 1 THEN ‘less than’ ELSE ‘less than or equal to’ END as comparison
, rv.value
, CONVERT (VARCHAR(6), CONVERT (INT, SUBSTRING (au.first_page, 6, 1) +
SUBSTRING (au.first_page, 5, 1))) + ‘:’ + CONVERT (VARCHAR(20),
CONVERT (INT, SUBSTRING (au.first_page, 4, 1) + SUBSTRING (au.first_page, 3, 1) + SUBSTRING (au.first_page, 2, 1) +
SUBSTRING (au.first_page, 1, 1))) AS first_page
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.object_id = i.object_id
AND p.index_id = i.index_id
INNER JOIN sys.objects o
ON p.object_id = o.object_id
INNER JOIN sys.system_internals_allocation_units au
ON p.partition_id = au.container_id
INNER JOIN sys.partition_schemes ps
ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions f
ON f.function_id = ps.function_id
INNER JOIN sys.destination_data_spaces dds
ON dds.partition_scheme_id = ps.data_space_id
AND dds.destination_id = p.partition_number
INNER JOIN sys.filegroups fg
ON dds.data_space_id = fg.data_space_id
LEFT OUTER JOIN sys.partition_range_values rv
ON f.function_id = rv.function_id
AND p.partition_number = rv.boundary_id
WHERE i.index_id < 2
AND o.object_id = OBJECT_ID(@TableName);

  • Example:

For FY17 the value column is NULL, which means that there is no boundary for that fiscal year (We used left range value on the function).

boundaries

  • Solution:

The solution for this issue is using switch out-switch in command + Split ( if you have this scenario please do not use SPLIT without removing the data!! Split command create a lot of entries on the transaction log… And if you fill up the transaction log the database will turn into recovery state!). I followed this excellent post from Kalyan Yella and Denzil Ribeiro -> https://blogs.msdn.microsoft.com/sql_pfe_blog/2013/08/13/oops-i-forgot-to-leave-an-empty-sql-table-partition-how-can-i-split-it-with-minimal-io-impact/. Also if you have more than one partitioned table, you will have to repeat the same for all tables.

  • To check all the tables that exist on the partition, you can do this:

In management studio, expand your database and go to Storage tab, expand it and go to Partition Schemes tab. Search your partition scheme, right click and click on view dependencies:

dependencies

dependencies

Preventive actions: Check boundaries!

I hope this helps! 😉

Share your comments or questions below!

And if you have any specific issue that want me to discuss, please add it on the comments!

Welcome! Bienvenido! Benvenuto! Willkommen! Bienvenue!

My name is Gonzalo Bissio. I’m 29, from Buenos Aires, Argentina.

I´m married, have 5 dogs, a lovely mother, 2 brothers and 2 sisters & 7 adorable nepheus. I’m a rocky movie fan and also soccer lover (Club Atlético Independiente, from Avellaneda!!).

This blog is meant to share my experiences and knowldege about SQL Server. I’ll try to help you on all those situations where you are under pressure and need to resolve issues ASAP, or when the fire has just passed, and you need to find RCAs.

Welcome and hope you enjoy my blog!