Saturn
Self-hosted ML pipelines

SATURN LAB

Structure for your ML chaos.
Visual pipelines. Persistent experiments. Full lineage.

Get early access
scroll

Turn scattered Jupyter notebooks
into reproducible pipelines.

Senior data scientists lose hours every week to notebook chaos — lost metrics, unknown lineage, full pipeline re-runs for a one-line change. Saturn Lab solves all of it, without changing how you work.

Problem

"23 notebooks. Nobody knows which one runs in production."

Solution

A visual swim-lane canvas that makes the pipeline obvious.

Source → Processing → Experiment. Every notebook has a defined role. Execution order is explicit. One button runs everything in the correct sequence.

localhost
Customer Analytics
▶ Run pipeline
Data Sources
📄 churn_data.csv
45,231 rows · 28 cols
Notebooks
📓 feature_engineering.ipynb
📓 preprocessing.ipynb
📓 xgb_classifier.ipynb
🧠
3 models
1 in production
Experiments
Models
| Customer Analytics 🔔
📄
churn_data.csv
45,231 rows · 28 cols
data/churn_data.csv
📄
eng_features.parquet
45,231 rows · 35 cols
output/eng_features.parquet
Processing
feature_engineering
feature_engineering.ipynb
Compute rolling features, encode categoricals
📋
↗ Open
Experiment
xgb_classifier
xgb_classifier.ipynb
accuracy 0.947 auc_roc 0.981
XGBoost with Optuna hyperparameter tuning
📋
↗ Open
Problem

"Tried a different aggregation. Results got worse. The previous approach? Gone — you overwrote the notebook."

Solution

Every run saved. Switch approaches freely.

Saturn captures the params and metrics of every run automatically. Try rolling_mean, compare it side-by-side with weighted_mean from run #4. The notebook changes — the history never disappears.

localhost / experiments
← Back
Experiments — Customer Analytics
Run Status aggregation accuracy ↑ auc_roc ↑ f1 ↑ Duration
#7 ● success rolling_mean 0.9471↑0.030 0.9813 0.9204 14m 32s
#6 ● success simple_sum 0.9170 0.9510 0.8940 11m 05s
#5 ● success median 0.9380 0.9720 0.9110 12m 44s
#4 ● success weighted_mean 0.9440 0.9791 0.9172 13m 18s
#3 ✓ cached 0s
Problem

"Tweaked one hyperparameter. Re-running four hours of training."

Solution

Unchanged nodes skip. Only what changed re-runs.

Saturn hashes each node's input files and notebook code. If nothing changed since the last successful run, the node is marked cached and skipped instantly. A 4-hour pipeline becomes 12 minutes.

localhost — run #8 in progress
Customer Analytics
◌ Running…
Recent runs
#8 main pipeline now
#7 main pipeline 2h ago
#6 main pipeline 5h ago
Experiments
Models
| Customer Analytics ◌ Running #8…
📄
churn_data.csv
45,231 rows · 28 cols
data/churn_data.csv
📄
eng_features.parquet
45,231 rows · 35 cols
output/eng_features.parquet
Processing
feature_engineering
feature_engineering.ipynb
inputs unchanged — skipped
📋
↗ Open
Experiment
xgb_classifier
xgb_classifier.ipynb
n_estimators=400, learning_rate=0.05
📋
↗ Open
Problem

"Production model degraded. Nobody knows what data trained it."

Solution

Model → run → code → data. Always.

One call: saturn.register_model(clf, name="churn_v3"). Every version linked to its exact pipeline run, training node, and dataset snapshot. Diagnose drift in minutes, not days.

localhost / models
← Back
Model Registry — Customer Analytics
🔵
churn_predictor v3 production
accuracy 0.9471 ★
auc_roc 0.9813 ★
f1 0.9204 ★
Run #7 · xgb_classifier · churn_data.csv v2 · 2.4 MB
Compare
Archive
🔵
churn_predictor v2 staging
accuracy 0.9440
auc_roc 0.9791
f1 0.9172
Run #6 · xgb_classifier · churn_data.csv v1 · 2.1 MB
Compare
Promote
🔵
churn_predictor v1 archived
accuracy 0.9312
auc_roc 0.9680
Run #3 · xgb_classifier · churn_data.csv v1
Restore
Everything you need. Nothing you don't.
🎨
Fully customizable UI
Rename swim lanes, set your own colors, add custom node types. Your pipeline looks exactly like your mental model of the problem.
📓
One click to JupyterLab
Every node has an ↗ Open button that drops you straight into the classic JupyterLab environment. No new tools to learn.
🗄️
Connect any database
Import data from PostgreSQL, MySQL, or MSSQL with a SQL query. Result saved as a versioned Parquet snapshot and wired into your pipeline automatically.
👥
Multi-user with admin panel
Create team accounts, manage roles, force password changes on first login. Full Django admin panel included out of the box.
🔒
Fully self-hosted
One docker compose up command. Your models, data, and experiments stay on your infrastructure — never on a third-party cloud.
📋
Notebook templates
Bootstrap any node with built-in templates for EDA, feature engineering, classification, regression, and model registration. One click, ready to run.
Deployment

One Docker command.
Launching soon.

Self-hosted. Your data never leaves your infrastructure. Works on any server — AWS, Azure, on-prem, your laptop. PostgreSQL included.

Get in touch →

Free for personal use · Team plan $300/month flat · No per-seat pricing