Skip to main content

Why fragmentation occurs and how to avoid/fix it.

Let's suppose you have this table:



We have made last name the primary key in this table
Now lets import some rows into the table and check the fragmentation



Notice the fragmentation at 96.48% and all we did was one import of 16426 records. So now we have several questions:
  1. How bad is this? Fragmentation causes SQL Server to skip around to read data from your tables. For a one time read on a small table this is virtually meaningless. In a job that may need to read millions of records, this could tear apart your performance.

  2. Ok this is bad. How did it happen? The primary key on a table is clustered. This means the data will be stored according to the primary key. In our case, last name. When data doesn't come in the same way you store it (i.e. Customer names are not coming in alphabetically, but rather randomly) SQL Server must constantly split pages to store the data correctly. This causes data to be "fragmented" into multiple areas of the disk instead of one continuous stream.

  3. How can I fix it? Running regular maintenance on the table will fix the problem. Rebuild the indexes and reorganize the indexes (Fragmentation less than 30% reorganize. if it is >= 30% Rebuild).

How can I avoid this problem? Store the data in the way it comes in. Now for random last names coming in this is impossible right? Wrong! One way to ensure data is always store in the way it comes in is by creating an identity seed. This is a meaningless number that is simply incremented each time a row is added. Set your primary key to this identity seed and data is always coming in numerically and one higher than the last record. Creating an identity seed is simple enough. Just go into the properties of the column you are creating, set identity specification to yes, set (is identity) to yes and you are done. This column will automatically increment so you don't want to set a value to this column during your import.


So here I have re-created the table with CustomerNumber as the primary key. CustomerNumber is also an identity seed.




Importing the same rows yields this result on fragmentation:



1.79% fragmentation. Much better!






Comments

Popular posts from this blog

SQL Server ETL for Data Lineage

What is data lineage? Ok let us suppose you built a wonderful database with loads of data coming from source files from your vendors, your own AS400, and different departments of your own organization. All of this data has been imported into 200 tables in this database producing a plethera of information that is used for reporting purposes. One day a department head comes to you and says, "I think this number here on this report is wrong. Where did you get it from?" Do you escort that person to the computer room and show him/her your server? No, I didn't think so. So how do you come up with where the information came from? Likely you find a data load expert in your IT department and have him/her spend the next several minutes/hours/days rummaging through stored procedures, ssis packages, dts packages, custom applications, etc trying to find this information. However, if you had this: you could simply pull up the history of how that data came to be loaded and point. How us...

DBA 101 - Connecting to an unresponsive SQL Server

I will attempt, over the course of many blogs, to tackle troubleshooting for a beginning to intermediate DBA. Troubleshooting is like an octopus with a hundred arms. There is no silver bullet but at least I can give you some tools for your belt to help determine the next steps in troubleshooting many common problems that you will see. So where do we begin? I don't know. Let's dive in and see where we end up. Problem - Nobody can connect to the SQL Server and it is not responding to any requests. Wow this seems like an impossible problem and is in reality two problems. Lets address the most critical problem which is you cannot even address why the SQL Server isn't responding to requests because nobody can connect to it to see what is going on. A weak solution - Often times an inexperienced DBA or what is often called an "accidental" DBA would pull the plug on the server, wait ten seconds, and then power it back on. Now this isn't the worst possible solution.....