Model Deployment8 min read

ML Pipeline

From raw data to production predictions

data collection:First step · garbage in, garbage outtraining:Core step · model learns patternsdeployment:Final step · model serves real users

Think of a car assembly line. Raw materials arrive at one end — sheets of metal, rubber, glass. They pass through stations: cutting, shaping, welding, painting, wiring, quality inspection. A finished car rolls off the other end.

If the metal is rusted, the car will be weak. If the welding station is miscalibrated, doors won't close. If quality inspection is skipped, defective cars reach customers. Every station matters.

An ML pipeline works the same way. Raw data flows in at one end, passes through cleaning, transformation, training, and evaluation stations, and predictions come out the other end. Skip a step or do one poorly, and the whole thing falls apart.

The stages of an ML pipeline

1. Data Collection

You need data. Lots of it. This might come from databases, APIs, web scraping, sensors, or manual labeling. The quality of your data caps the quality of your model.

2. Data Cleaning

Real data is messy: missing values, duplicates, typos, inconsistent formats ("New York" vs "NY" vs "new york"). This stage is often 80% of the work.

3. Feature Engineering

Transform raw data into features the model can use: scale numbers, encode categories, create derived features, handle text/images.

4. Train/Test Split

Hold out data the model won't see during training. This is your unbiased evaluation set.

5. Model Training

Pick an algorithm, feed it the training data, tune hyperparameters. This is what most people think ML is — but it's actually only one step in the chain.

6. Evaluation

Test the model on held-out data. Check accuracy, precision, recall, F1. If it's not good enough, go back to step 2 or 3 and iterate.

7. Deployment

Ship the model to production where it serves real predictions. This means building an API, monitoring performance, handling edge cases, and planning for retraining.

A Complete ML Pipeline in Code

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.datasets import load_iris
import numpy as np

# 1. Load data
X, y = load_iris(return_X_y=True)
print(f"Raw data shape: {X.shape}")

# 2-3. Build a pipeline: impute → scale → train
pipe = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),  # Handle missing values
    ('scaler', StandardScaler()),                  # Normalize features
    ('model', RandomForestClassifier(n_estimators=100, random_state=42))
])

# 4. Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. Train the pipeline
pipe.fit(X_train, y_train)

# 6. Evaluate
train_acc = pipe.score(X_train, y_train)
test_acc = pipe.score(X_test, y_test)
cv_scores = cross_val_score(pipe, X, y, cv=5)

print(f"Train accuracy: {train_acc:.3f}")
print(f"Test accuracy:  {test_acc:.3f}")
print(f"CV mean:        {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

Output

Raw data shape: (150, 4)
Train accuracy: 1.000
Test accuracy:  0.967
CV mean:        0.960 ± 0.022

Why pipelines matter

Without a pipeline, data preprocessing and model training happen in scattered scripts. This leads to:

Data leakage — accidentally using test data during preprocessing (e.g., scaling with test set statistics)
Inconsistency — applying different transformations during training and prediction
Messy code — impossible to reproduce or debug

A pipeline bundles everything together: when you call pipe.fit(X_train, y_train), it fits the scaler AND trains the model. When you call pipe.predict(X_test), it applies the same scaling and then predicts. No leakage, no inconsistency.

The deployment gap

Getting a model working in a Jupyter notebook is 10% of the job. Deploying it to production — where it handles real traffic, monitors for drift, retrains on new data, and fails gracefully — is the other 90%. That's why tools like MLflow, Kubeflow, and Airflow exist.

Note: The #1 mistake in ML projects: spending all your time on the model and almost none on the data. In practice, data quality and feature engineering determine 80% of your model's performance. The algorithm is often the least important choice.