Back in the heyday of my tech enthusiasm, I often wondered why it was so hard to build data systems that had one source of data truth.
The data would be accurate and up to date. How hard could it be to make all systems draw from this central source? Updates to this central system would load in from satellite systems and all would be right with the world.
And when the web came around, I was v certain that the golden age of data integrity was right around the proverbial corner.
Finally, a central source of data truth that was accessible to everyone.
Well, things didn’t work out the way I expected it. There are a few sources of ‘centralized’ data out there. I wouldn’t necessarily go so far as to call them sources of data truth.
It is not that these centralized sources set out to intentionally lie, but the method by which they centralized the information makes the informational more variable than my naive younger would have thought possible.
The idea of a data truth is about as realistic as their being one truth. An organization can believe it has its own
data truth, but the fact that there might be some noise in that truth might actually be healthy.
Just as a monoculture in agriculture is not healthy, a mono truth in data may be equally unhealthy in a data ecosystem