Friday, January 25, 2013

Rethinking the Dimensions of Data Quality

A few months ago, I wrote a column asking if the dimensions of data quality, such as accuracy, consistency and timeliness, are real. I pointed out that there are no generally accepted definitions for the dimensions, no generally accepted exhaustive list of them and no generally accepted methodologies for measuring each one.

Since the column was published, I have been "encouraged" to say something a little more positive on this topic – something that will help practitioners deal with the daunting problems of data quality. I agree that being negative is not that helpful, although it is refreshing to have a frank conversation about what really underlies terms that are often thrown about our industry.

The Road to Abstraction

One argument in favor of having dimensions of data quality is that data quality is such an enormous space that we cannot deal with it effectively unless we break it down into subareas. I think it is better to say that data quality represents a large, complex set of issues and that we need to tease out individual types of issues, each with its own specific problems and requiring its own specific methods to deal with it. This is more of a bottom-up approach.

However, it seems to me that the first view prevails. The top-down approach is repeatedly taken, with attempts made to break data quality up into different dimensions. Here I am talking from experience: I have seen this approach first-hand at a number of conferences and industry initiatives.

What this top-down approach focuses on are abstractions, and we need to understand abstractions to appreciate what is going on. There are many different classes of abstraction, but the one involved here is the process of turning a property into an object. This is a source of argument among philosophers, going back to Plato, who believed that such abstractions really exist in some part of the universe, just as much as material objects do.

This form of abstraction is illustrated in the following example. Imagine I am represented by a customer record in a database of Enterprise X. This record holds my date of birth. If the value is my actual date of birth, we can agree it is completely accurate. If the month and year of this value are correct, but not the day, then we can say it is reasonably accurate. If the year is correct, but not the month and day, we can say it is moderately accurate. If day, month and year are incorrect, we can say it is not accurate at all. We are using the term "accurate" to describe the quality of the relationship between the data value and the reality it is trying to represent. 

Human beings then make the leap from using "accurate" as an adjective to using "accuracy" as a noun. Our language allows us to do that, but that does not mean reality has to go along with us. We have created a type of abstraction and this type of abstraction (a) is not instantiated, (b) does not bear properties and (c) cannot enter into causal relationships. The concept "dog" is instantiated in my pet Leo, who weighs about eight pounds and knows perfectly well how to manipulate me into feeding him. The concept "accuracy" is not instantiated anywhere. We do not see "accuracies" lying around in the universe, or having attributes like color or weight, and “accuracy” does not enter into causal relationships. We can say that the birth date example above "represents an instance of accuracy," but this is a bewitchment of our language. Just because we can say it does not make it so. It’s better to say the example "has the property of being accurate."

But We Need the Dimensions of Data Quality

So far, this is still of little help to the practitioner. We know that data quality is large and complex, and it is our duty to improve it as much as we can for the enterprises we work for.

Here, I think the bottom-up approach is better. This does not view a dimension like "accuracy" as an abstracted object with a single definition. Rather, each dimension is a complex area that has its own structure, problems and methods.

If I think about accuracy, I can ask a series of questions, such as:

  • Is the thing being represented covered by the definition used for the entity/table in the database?
  • Is the attribute of the thing being represented covered by the definition used for the attribute/column in the database?
  • Does the thing represented by a record in the table in the database actually exist?
  • Does the value held in the column of the record in the table in the database objectively represent the expression of the attribute of the thing?
  • Does the value held in the column of the record in the table in the database subjectively represent the expression of the attribute of the thing to the extent needed to meet business requirements?

There are very likely even more questions pertaining to accuracy that can be asked. This shows that accuracy is a complex of many concepts, not a single concept. If you object that each question needs to be broken out as a different dimension, then you are going to end up with an awful lot of dimensions, as I have many such questions for each of the traditional dimensions. The traditional dimensions give the illusion that each is only answering one question, because somehow each has a single definition.

The questions cover the structure and problems of accuracy. Methods are another facet that needs to be addressed. Some methods include:

  • Testing the entity/table definitions with data producers to see if they correctly classify instances to the entity/table.
  • Checking that the attribute/column has not been deliberately repurposed by operational staff to hold something other than what the official definition describes.
  • Sampling records and auditing that these represent real-world instances.
  • Sampling data values and independently measuring the attributes of the things they represent.
  • Sampling data values, independently measuring the attributes of the things they represent and comparing these to the tolerances allowed in each business use case.

The first sample data set is always the hardest, but it can roughly tell you how good the data capture process is.

Thursday, January 10, 2013

How to Allocate Your Time, and Your Effort

How does he find time to meet with 10 customers a week and make his yearly quota in the first quarter?, a salesman wonders about his top producing coworker. I can barely find time to have five appointments a week and get all my paperwork done correctly and turned in on time.

How does she manage to champion strategic initiatives, network with executives, and only work 40 hours a week?, a manager ponders about his colleague on the corporate fast track. After a day full of project meetings, the best I can do is reactively respond to e-mail at night instead of proactively developing my department.

Here's the secret: Your colleagues that zoom ahead of you with seemingly less effort have learned to recognize and excel in what really counts — and to aim for less than perfect in everything else.

Most likely the highest producing salesman on your team spends less than half the amount of time that you do on filling out paperwork. Yes, it may be sloppy, but no one really cares because he's skyrocketing the revenue numbers. The manager who has caught the eye of upper management may send e-mails with imperfect grammatical structure and decline invites to tactical meetings. But when a project or meeting really matters, she outshines everyone.

If you're shocked and feel like this seems completely unfair, I'm guessing that you probably performed very well in school where perfectionism is encouraged.

I know. I was a straight-A student from sixth grade through college graduation who did whatever it took to produce work at a level that would please my professors. Admittedly, this strategy paid off as a student. My perfect GPA signified an exceptional level of achievement, and I was fortunate that in my case, it was rewarded with scholarships and job offers.

The rules changed when I started my own business over seven years ago. I realized that doing A-work in everything limited my success. At that point I realized that I needed to focus more on my strengths. As Tom Rath wisely explains in his StrengthsFinder books, you can achieve more success by fully leveraging your strengths instead of constantly trying to shore up your weaknesses. Realizing the importance of purposely deciding where I will invest more time and energy to produce stellar quality work and where less-than-perfect execution has a bigger payoff has had a profound impact on my own approach to success and my ability to empower clients who feel overwhelmed.

As I talk with time coaching clients struggling with overwhelm whether they be professors, executives, or lawyers, a common theme comes up — they can't find time to do everything. And, they're right: no one has time for everything. Given the pace of work and the level of input in modern society, time management is dead. You can no longer fit everything in — no matter how efficient you become. (This conundrum is what inspired me to write a book on time investment).

In my time investment philosophy, I encourage individuals to see time as the limited resource it is and to allocate it in alignment with their personal definition of success. That leads to a number of practical ramifications:

  • Decide where you will not spend time: Given that you have a limited time budget, you will not have the ability to do everything you would like to do regardless of your efficiency. The moment you embrace that truth, you instantly reduce your stress and feelings of inadequacy. For example, professionally this could look like reducing your involvement in committees, and personally this could look like hiring someone else to do lawn maintenance or finish up a house project.
  • Strategically allocate your time: Boundaries on how and when you invest time in work and in your personal life help to ensure that you have the proper investment in each category. As a time coach, I see one of the most compelling reasons for not working extremely long hours is that this investment of time resources leaves you with insufficient funds for activities like exercise, sleep, and relationships.
  • Set up automatic time investment: Just like you set up automatic financial investment to mutual funds in your retirement account, your daily and weekly routines should make your time investment close to automatic. For example, at work you could have a recurring appointment with yourself two afternoons a week to move forward on key projects, and outside of work you could sign up for a fitness boot camp where you would feel bad if you didn't show up and sweat three times a week.
  • Aim for a consistently balanced time budget: Given the ebbs and flows of life, you can't expect that you will have a constantly balanced time budget but you can aim for having a consistently balanced one. Over the course of a one- to two-week period, your time investment should reflect your priorities.

Once you have allocated your time properly, you also need to approach the work within each category differently. As I explained above, trying to "get As in everything" keeps you from investing the maximum amount of time in what will bring the highest return on your investment. That's why I developed the INO Technique to help overcome perfectionism and misallocation of your 24/7. Here's how it works:

When you approach a to-do item, you want to consider whether it is an investment, neutral, or optimize activity. Investment activities are areas where an increased amount of time and a higher quality of work can lead to an exponential payoff. For instance, strategic planning is an investment activity; so is spending time, device-free, with the people you love. Aim for A-level work in these areas. Neutral activities just need to get done adequately; more time doesn't necessarily mean a significantly larger payoff. An example might be attending project meetings or going to the gym. These things need to get done, but you can aim for B-level work. Optimize activities are those for which additional time spent leads to no added value and keeps you from doing other, more valuable activities. Aim for C-level work in these — the faster you get them done, the better. Most basic administrative paperwork and errands fit into this category.

The overall goal is to minimize the time spent on optimize activities so that you can maximize your time spent on investment activities. I've found that this technique allows you to overcome perfectionist tendencies and invest in more of what actually matters so you can increase your effectiveness personally and professionally.

On a tactical level, here are a few tips on how you can put the INO Technique into action:

  • At the start of each week, clearly define the most important investment activities and block out time on your calendar to complete them early in the week and early in your days. This will naturally force you to do everything else in the time that remains.

  • When you look over your daily to-do list, put an "I," "N," or "O" beside each item and then allocate your time budget accordingly, such as four hours for the "I" activity, three hours for the "N" activities, and one hour for the "O" activities.

  • If you start working on something and realize that it's taking longer than expected, ask yourself, "What's the value and/or opportunity cost in spending more time on this task?" If it's an I activity and the value is high, keep at it and take time away from your N and O activities. If it falls into the N category and there's little added value or the O category and spending more time keeps you from doing more important items, either get it done to the minimum level, delegate it, or stop and finish it later when you have more spare time.

  • If you keep a time diary or mark the time you spent on your calendar, you can also look back over each week and determine if you allocated your time correctly to maximize the payoff on your time investment.

Such true..the overall goal is to minimize the time spent on optimize activities so that you can maximize your time spent on investment activities.