Data Science Framework high level introduction, description of phases and basic workflow.
The Polidata Data Science Framework is the result of working with data and analytics challenges over a number of years. The simple and pragmatic steps intentionally hide the complexity behind each phase so that the basic principle is clear to both business and technical leaders and team members. The main purpose is to get everyone on the same page before any implementation steps are planned and executed.
The framework has matured over the last couple of years based on client engagements. The planning and implementation is constantly evolving because of the fast moving data science industry. It is up to us to stay current and keep our clients current as part of the same process.
There are four phases in the framework as shown in the image below. These steps are sequential in nature, because each builds on its predecessor. During a complex program there are building blocks that affect a number of dependant projects, such as the creation of a data warehouse. The opposite is also true because specialised tasks in many cases only relate to one project only. Regardless of overlap or specialisation, each project has to undergo these steps to completion.
The four phases or steps in the Data Science Framework represents the following:
- Gather - identifying and collecting all relevant data available internally and externally
- Manage - architecture and construction of platforms that ingest, store and make data available for analysis
- Analyse - shaping and modelling of data, analysis algorithm selection and fine tuning of results
- Insight - translation of technical results into business insights that support strategic and operational decisions
Each phase is explained in more detail when you follow the links.
In some cases, companies are already at a certain maturity level with data warehouses or data lakes already in place. In such cases the steps take the form of a quality review instead of the actual build process. We can inherit and use the investments already made so that no traction is lost.
As mentioned above, the framework phases are sequential in nature, but in the real world during an implementation it is often required to revisit earlier steps because of changes in demand or the discovery of new features in the data. The life-cycle expressed by the framework also does not ever really come to an end because changes in the environment will expose new features, require models to be adapted and retrained and visualisations to be updated.
Data Science is a voyage of discovery, so the journey should not come to an end. Every new insight is a competitive advantage.