June 11, 2023

4 min read


How To Apply The "PPDAC" Framework To Your Data Science Problem


The "PPDAC" problem-solving cycle is a handy framework to formally apply the rigor of the "Scientific Method" to your Data Science Problem. Any specific statistical technique can be seen as one small component of this complete end-to-end cycle of problem-solving.

PPDAC Problem-Solving Cycle

The key thing to observe in the image above is that the entire process is iterative i.e. we iterate through the entire PPDAC cycle as well as within each step of the cycle.

The framework was developed by R. J. MacKay and R. W. Oldford. Most recently it was popularized by David Spiegelhalter in his book "The Art of Statistics: Learning from Data".

Problem


As always the first step is to understand and define the problem you are trying to solve.

Here are some questions to help you think about the problem at hand:

  • What is the problem you are trying to solve?
  • What is the precise question to which you are looking to find an answer to?
  • Do you understand the problem?
  • Can you explain your question to a five year old?
  • Have you written down the question?
  • Can the problem be solved with the information available?

Plan


While it may be tempting to get started with analysis as soon as you have the data, having a well thought out design plan for your study can save you plenty of time & rework.

Here are some questions to help you think about your plan:

  • What to measure?
  • How to measure?
  • Did you chose a sample that is truly representative or was the current sample chosen merely because it is convenient and inexpensive?
  • Is there bias in your data?
  • What was omitted from the data?
  • What is your end goal?
  • How do you know you are done?

Data


At the heart of your "Data Science" problem is the data itself.

Here are some questions to help you think about your data:

  • How was or how will the data be collected?

  • How can you improve the quality of your data?

  • How do you plan to apply the following "first mile services"?

    • Data Cleaning
    • Data Transformation
    • Data Pipelines

Analysis


This is arguably the most interesting phase of the PPDAC cycle.

Here are some questions to help you think about your analysis process:

  • Did you find and resolve the following classes of errors?

    • Coding errors
    • Human errors
    • Data errors
  • Did you avoid common psychological fallacies? See our blog post for more on this.

  • Did you use one or more of the following "first mile services"?

    • Exploratory Data Analysis
    • Data Dashboards
    • Feasibility Study
  • Have you labelled, classified and sorted your data appropriately?

  • Are you using the appropriate data structure to represent your data such as tables, charts, graphs etc.?

  • What patterns do you see?

  • What hypothesis can you generate?

Conclusion


The last step of the PPDAC cycle is to finally answer the question that you set out to answer and communicate it to your audience/stakeholders.

Here are some broad questions to help you think about your conclusions:

  • Are your results reproducible?
  • Do your conclusions duly acknowledge all limitations?
  • Is there any selective reporting in your conclusions?
  • Is this study exploratory or confirmatory?
  • What are the P-Values being reported that summarize the strength of evidence for your conclusions?
  • How do you plan to communicate your conclusions?

At the end of a single iteration of the PPDAC cycle, you usually end up with a set of conclusions. This naturally gives rise to more questions and so the PPDAC cycle begins again for the next iteration.

It is likely that for a seasoned Data Scientist, most of these steps are second nature and intuitive. However, it is still highly beneficial to use this as one of the formal frameworks to solve your data science problem. This way you can also communicate more clearly with your teammates and never lose sight of the bigger picture.

On a final note, you may also find the following related article helpful: "Ten Simple Rules For Effective Statistical Practice".