Methodology
Data Analysis Methodology
Last updated
Data Analysis Methodology
Last updated
Defining an analytics roadmap and using a strategic framework to use with the roadmap is key to the success of any data analytics project. All data analytics project must start with a clear understanding the customer requirements and organizational needs. This should be followed by an established data discovery lifecycle to structure the project. The discovery lifecycle is key to successfully planning and executing all steps from start to finish.
The high-level goal or problem statement can be continuously refined as the discovery lifecycle is iterative. The objective should be to refine the business questions and models until you arrive upon the desired final model which can be operationalized. The discovery lifecycle is loosely based on Crisp-DM (Cross Industry Standard Process for Data Mining), which is a widely adopted methodology for data mining and knowledge discovery.
CRISP-DM is a measured, step-by-step approach with a systems perspective for managing the complete lifecycle of the analytic initiatives. The process is broken into six major phases, but the exact sequence is not strict and implementers are free to move back and forth between different phases as needed.
In the figure above, the outer loop represents the cyclical nature of data mining and knowledge discovery process. Typically, the discovery process can continue even after the model is built and operationalized. The analysts can continue to use the process to look for new insights with the results from each iteration triggering new questions and learning from the previous ones.
The Discovery lifecycle includes the following major stages that should be executed iteratively:
Understand Business Needs
Acquire and Understand Data
Build and Refine Model
Evaluate Model
Operationalize Model