What is data lineage?
Ok let us suppose you built a wonderful database with loads of data coming from source files from your vendors, your own AS400, and different departments of your own organization. All of this data has been imported into 200 tables in this database producing a plethera of information that is used for reporting purposes. One day a department head comes to you and says, "I think this number here on this report is wrong. Where did you get it from?" Do you escort that person to the computer room and show him/her your server? No, I didn't think so. So how do you come up with where the information came from? Likely you find a data load expert in your IT department and have him/her spend the next several minutes/hours/days rummaging through stored procedures, ssis packages, dts packages, custom applications, etc trying to find this information. However, if you had this:

you could simply pull up the history of how that data came to be loaded and point. How useful is this? Well it would save that poor data expert of yours minutes/hours/days per senseless inquiry where he/she could be cranking out the next great package. Basically data lineage is invaluable.
How hard is data lineage to come by? Well unfortunately right now you don't have an easy solution in SQL Server. There are plenty of examples on how to record your source information to the destination in separate tables in sql server. Does this give you user friendly access to your data lineage no. You can spend a fortune on products like informatica which admittedly do a wonderful job of graphically representing your data lineage but what if you are on a budget? What if you are in a small shop and the owner keeps coming to you several times a day/week/month asking questions about where you came up with numbers on his/her report? Sadly, I have led you to this climatic point only to tell you there is no answer that I would hang my hat on. There is open source data lineage that is graphical in nature but the last time I looked at it I would liken it to heiroglyphics drawn by our ancestors on the walls of caves. Pathetic and lacking. I have asked microsoft to include data lineage in future releases of SQL Server. I haven't heard a word back on whether this will happen or not. My guess is not anytime soon but we can still be hopeful.
Ok let us suppose you built a wonderful database with loads of data coming from source files from your vendors, your own AS400, and different departments of your own organization. All of this data has been imported into 200 tables in this database producing a plethera of information that is used for reporting purposes. One day a department head comes to you and says, "I think this number here on this report is wrong. Where did you get it from?" Do you escort that person to the computer room and show him/her your server? No, I didn't think so. So how do you come up with where the information came from? Likely you find a data load expert in your IT department and have him/her spend the next several minutes/hours/days rummaging through stored procedures, ssis packages, dts packages, custom applications, etc trying to find this information. However, if you had this:
you could simply pull up the history of how that data came to be loaded and point. How useful is this? Well it would save that poor data expert of yours minutes/hours/days per senseless inquiry where he/she could be cranking out the next great package. Basically data lineage is invaluable.
How hard is data lineage to come by? Well unfortunately right now you don't have an easy solution in SQL Server. There are plenty of examples on how to record your source information to the destination in separate tables in sql server. Does this give you user friendly access to your data lineage no. You can spend a fortune on products like informatica which admittedly do a wonderful job of graphically representing your data lineage but what if you are on a budget? What if you are in a small shop and the owner keeps coming to you several times a day/week/month asking questions about where you came up with numbers on his/her report? Sadly, I have led you to this climatic point only to tell you there is no answer that I would hang my hat on. There is open source data lineage that is graphical in nature but the last time I looked at it I would liken it to heiroglyphics drawn by our ancestors on the walls of caves. Pathetic and lacking. I have asked microsoft to include data lineage in future releases of SQL Server. I haven't heard a word back on whether this will happen or not. My guess is not anytime soon but we can still be hopeful.
Comments
Post a Comment