End-to-End Data Science with IBM Watson Studio and Auto AI

 7 min read

YouTube video ID: 3DElV51FqL8

Source: YouTube video by IBM DeveloperWatch original video

PDF

Pawa Siddiki and Khalil Faraj, Developer Advocates at IBM, presented a workshop that walked through the data science lifecycle and a hands-on example using IBM Watson Studio. The agenda covered the stages of data science, distinctions among AI/ML/DL, types of data analysis, data preparation and exploration, the data science pipeline, model types, Auto AI, and a practical customer churn prediction exercise. Housekeeping noted that an IBM Cloud account is required for the hands-on portion and that guides and assets were provided.

Understanding AI, ML, and Deep Learning

Artificial Intelligence (AI) was defined as any technique that enables computers to mimic human intelligence or aid human decision-making. Machine Learning (ML) is described as a subset of AI where computers learn from data. Deep Learning (DL) was framed as a subset of ML that uses multiple layers of neural networks to emulate aspects of the human brain. Data science ties these concepts to practical business challenges: "Data science it's basically the study of data you analyze it you get these statistics you identify patterns and then you apply them on business challenges that you focus on."

Types of Data Analysis

Four types of analysis were outlined: descriptive, diagnostic, predictive, and prescriptive. Descriptive analysis answers what happened and is exemplified by dashboards. Diagnostic analysis explains why something happened, such as analyzing a social media campaign. Predictive analysis forecasts what will happen, and prescriptive analysis recommends actions, for example through recommendation engines.

TypePurposeExample
DescriptiveWhat happenedDashboards
DiagnosticWhy it happenedCampaign analysis
PredictiveWhat will happenForecasting future sales
PrescriptiveWhat to doRecommendation engines

Data Scientist's Role

A data scientist was characterized as someone who wears different hats and is not just dealing with machine learning models. Responsibilities span from understanding domain problems to preparing data and selecting or building models. The role blends domain knowledge with technical skills to produce reliable models and future assumptions.

Types of Data

Data types were categorized as structured, semi-structured, and unstructured. Each type requires different handling during preparation and analysis. Understanding the data type is a prerequisite to choosing appropriate cleaning, transformation, and modeling techniques.

Data Preparation

Data preparation steps included cleaning, transformation, and enrichment. Data cleaning removes bad formats, handles missing data, eliminates useless variables, and corrects wrong values. Data transformation changes formats and column types (for example, integer, string, boolean), and derived variables can be created from existing fields (such as age from an ID). Normalization, handling inconsistent spellings or nicknames, and feature value rescaling were highlighted. Data enrichment refers to looking up and adding information, for instance deriving age from a profile record.

Data Exploration

Data exploration relies on visualization to surface insights. Typical visualizations described were bar charts, line charts, scatter plots, bubble charts, and pie charts. Visual exploration supports hypothesis generation and guides feature selection for modeling.

Data Science Pipeline

The pipeline was presented as an iterative sequence: raw data → processing (cleaning, transformation) → exploration → model design → learning → verification → update/improve → deployment. The process repeats until the model is satisfactory. Domain knowledge and expert knowledge both play roles: domain knowledge informs the use case, while expert knowledge helps build reliable models. The stages map to common tasks such as Data Cleaning, Exploratory Data Analysis (EDA), and ML/Data Modeling, culminating in model deployment.

Machine Learning Models

Machine learning model categories were outlined as unsupervised, supervised, and reinforcement learning. Unsupervised learning focuses on clustering—grouping data based on features. Supervised learning includes classification and regression, targeting categorical and continuous outcomes respectively. Reinforcement learning involves agents, environments, rewards, and observations and is based on actions and feedback.

CategoryFocusExample Task
UnsupervisedClusteringGrouping similar customers
SupervisedClassification / RegressionPredict churn / predict price
ReinforcementAgent-based learningReward-driven policies

Supervised Learning Deep Dive

Classification was described as predicting a categorical or qualitative target label, while regression predicts a continuous numerical target. Examples included predicting a class label such as "apple" or "cupcake" for classification, and predicting a numerical price for regression. Feature variables and target labels are central to supervised learning workflows.

Unsupervised Learning Deep Dive

Clustering was presented as grouping records based on features without labeled outcomes. Once clusters are defined, unseen data can be placed into existing clusters for segmentation or downstream decisions.

Machine Learning vs. Deep Learning

Deep learning was characterized by multiple hidden layers that learn features automatically; it is computationally intensive and time-consuming. Traditional machine learning requires feature engineering and often has models available via APIs. The trade-off centers on automated feature learning in deep learning versus manual feature engineering in traditional ML.

Auto AI

Auto AI was introduced as automated artificial intelligence built on an AutoML framework. The outlined steps are: upload data, prepare data, select the modeling task, perform hyperparameter optimization, and apply feature engineering. Benefits emphasized include faster model building, bridging the skills gap, discovering use cases, and rapid deployment. It was noted that Auto AI visualizes how it generates pipelines and can build models "without any line of code basically you want to do it that easy."

Details included the Auto AI runtime: typical experiment duration is 2–4 minutes, and for each selected algorithm Auto AI generates four pipelines (a base model, two hyperparameter optimization variants, and a feature engineering variant). Quotes emphasized automation of the time-consuming aspects: "Hyper parameter optimization feature engineering are most are the most time consuming aspects in a data science pipeline and this automation with this automation you can actually rapidly develop them."

Workshop Use Case: Customer Churn Prediction

The practical use case was predicting which customers are likely to stop using a service. Tools used in the demonstration were IBM Watson Studio, Data Refinery, and Auto AI, with a customer_churn.csv dataset. The workshop showed an end-to-end flow from data ingestion and refining to Auto AI experiments and deployment.

Hands-on Session (Khalil Faraj)

Khalil demonstrated setting up IBM Cloud resources needed for the hands-on: Watson Studio, Cloud Object Storage, and a Machine Learning instance. In Watson Studio, a project was created and the customer_churn.csv file uploaded and previewed. Data Refinery was used to refine the churn column: converting boolean strings to integers via conditional replace (True/False → 1/0) and saving the refined dataset as a job.

An Auto AI experiment was created and associated with the Machine Learning service instance. The cleaned data was uploaded, the churn column selected as the target, and Auto AI auto-parsed the features. Settings included binary classification with accuracy as the metric. Auto AI generated model pipelines for selected algorithms such as LGBM Classifier and XGB Classifier; pipelines included base models, hyperparameter optimization, and feature engineering variants. The run produced progress and relationship maps, and the results were analyzed for accuracy, other metrics, and feature importance. The best performing model was saved.

Model deployment involved promoting the saved model to a deployment space and creating an online deployment. The deployment was tested via the Watson Studio interface and the API reference. An API key was generated through IBM Cloud IAM for application integration. The workshop demonstrated a test where a male customer with specified attributes was predicted as 'False' for churn with a confidence score of 0.79.

Practical Summary of Steps Demonstrated

The workshop sequence as shown in the hands-on session consolidated into these steps: set up IBM Cloud resources, create a Watson Studio project, upload and preview data, refine data in Data Refinery, create and run an Auto AI experiment, analyze and save the best model, promote and create an online deployment, and test the deployed model via the interface or API. Presenters summarized: "So it was a very simple workshop what we did was loaded our data onto watson studio used data refined we prepared the data created an auto ai experiment we ran analyze the auto ai job selected the best model deployed the model to watson studio and we made our prediction."

  Takeaways

  • AI, ML, and Deep Learning are nested concepts where data science applies them to analyze data and solve business challenges.
  • Data preparation—cleaning, transformation, derived variables, normalization, and enrichment—is essential before modeling.
  • The data science pipeline is iterative: raw data to processing, exploration, model design, learning, verification, and deployment.
  • Auto AI automates model generation, hyperparameter optimization, and feature engineering, producing multiple pipelines per algorithm.
  • The hands-on workflow demonstrated loading data, refining it in Data Refinery, running Auto AI experiments, saving the best model, deploying it, and testing via API.

Frequently Asked Questions

Who is IBM Developer on YouTube?

IBM Developer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF