Data Science Lifecycle: Five Stages for Intelligent Applications

 3 min read

YouTube video ID: LeEj3S4Okao

Source: YouTube video by Microsoft DeveloperWatch original video

PDF

Data science, machine learning, and AI play a central role in building intelligent applications. This article explains the data science lifecycle used to structure data science projects intended to become part of those applications. The lifecycle is composed of five major stages that guide a project from problem definition to customer acceptance.

The five-stage lifecycle

"The lifecycle is designed for data science projects that are intended to shape as part of your intelligent applications." The data science lifecycle is composed of five major stages that are; business understanding, data acquisition and understanding, modeling, deployment, and customer acceptance. These stages form a repeatable process to take a project from idea to production.

Business understanding

The first stage focuses on defining objectives by understanding business problems with stakeholders. Project teams identify the questions that need answering and clarify success criteria. Teams also identify relevant data sources that can provide the information needed to address those business questions.

Data acquisition and understanding

Goals in this stage are to produce a clean, high-quality dataset and to develop a data pipeline architecture. Typical steps include ingesting data into the analytic environment and exploring that data to check quality. Setting up a data pipeline ensures the system can score and refresh data as needed for downstream modeling and production.

Modeling

Modeling centers on feature engineering, model training, and determining whether a model is suitable for production. Steps include creating features from raw data to train models and finding an accurate model that answers the defined question. Teams compare success metrics across candidate models and determine if the chosen model is ready for production deployment.

Deployment

The deployment stage aims to move the model and its pipeline into a production environment where applications can consume the results. The recommended method is to expose models with an open API interface. "You need to expose them with an open API interface. The interface enables the model to be easily consumed from different types of applications."

Consumption scenarios

Once exposed via an API, models and pipelines can be consumed by a variety of applications. Examples of consuming applications include online websites, spreadsheets, dashboards, and back-end applications. This variety enables integration of model outputs into user-facing and automated systems.

Finalization and customer acceptance

The final stage confirms that the pipeline, model, and deployment satisfy stakeholder objectives. Customer acceptance closes the loop by validating that the delivered solution meets the defined business goals. When stakeholders confirm requirements are met, the project transitions from delivery to operational use.

Lifecycle summary table

StagePrimary focus and steps
Business understandingDefine objectives with stakeholders; identify data sources.
Data acquisition & understandingIngest and explore data; clean dataset; set up data pipeline.
ModelingFeature engineering; train models; compare metrics; readiness.
DeploymentDeploy pipeline and model; expose via open API.
Customer acceptanceConfirm pipeline, model, and deployment meet objectives.

The lifecycle is intended to be repeatable and to ensure projects deliver measurable value as part of intelligent applications. Learn more at aka.ms/datasciencelifecycle.

  Takeaways

  • Data science projects for intelligent applications follow a structured lifecycle of five major stages.
  • Business understanding requires defining objectives with stakeholders and identifying relevant data sources.
  • Data acquisition and understanding aims to produce a clean dataset and a pipeline for scoring and refreshing data.
  • Modeling focuses on feature engineering, training models, comparing metrics, and assessing production readiness.
  • Deployment exposes models via an open API so websites, spreadsheets, dashboards, and back-end apps can consume them.

Frequently Asked Questions

Who is Microsoft Developer on YouTube?

Microsoft Developer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF