🚀 SpaceX Launch Success Prediction with Machine Learning

Alvaro Mejia
May 28, 2025
2 min read

How I predicted SpaceX rocket landing success using real launch data and interactive dashboards.

Role: Data Scientist

Team Size: Individual project

Duration: 3 weeks

Stack: Python · Pandas · SQL · Plotly Dash · Folium · REST APIs · Scikit-learn · SVM · Decision Trees · Logistic Regression

🧠 Project Overview

This project was developed as the final capstone for the IBM Data Science Professional Certificate. The objective was to analyze SpaceX's Falcon 9 launch data to uncover which factors influence the success of first-stage rocket landings and to build machine learning models that predict landing outcomes.

By integrating data science techniques — from API usage and web scraping to EDA, dashboarding, and machine learning — this project demonstrates an end-to-end approach to solving a real-world aerospace problem using data.

🔍 Problem Statement

SpaceX can significantly reduce launch costs by recovering and reusing the first stage of its Falcon 9 rockets. However, not all launches result in successful landings. Key questions addressed:

Can we predict whether a rocket will land successfully based on launch characteristics?
Which features (payload, orbit, launch site, etc.) are most influential?

📊 Data Collection & Processing

Sources:

Steps:

Data retrieved using Python requests and web scraping (BeautifulSoup).
Cleaned and filtered for Falcon 9 launches.
Missing values were handled (e.g., payload mass imputation).
Binary target variable created: landing_class (1 = success, 0 = fail).

📈 Exploratory Data Analysis (EDA)

Techniques used:

SQL queries on launch data for patterns (e.g., payload averages, site success rates).
Visual analysis with seaborn, Matplotlib, and pandas.
Key findings:
- Launch site, payload mass, and booster version have strong correlation with landing outcomes.
- Sites like KSC LC-39A had the highest success rate (~77%).
- Best results were achieved with payloads between 2000–5500kg and booster version FT.

🌍 Interactive Visualization

Tools: Folium, Plotly Dash

Folium Map:
- Visualized launch site locations.
- Overlayed launch outcomes and distances to coastlines and airports.
Dashboard (Plotly Dash):
- Pie Chart: Displays success rate per launch site.
- Payload Slider + Scatter Plot: Shows success rate correlation with payload and booster version.
- Fully interactive — adjust site and payload range to explore outcomes.

🤖 Machine Learning Modeling

Goal: Predict success of Falcon 9 first-stage landings.

Models trained:

Logistic Regression
Support Vector Machine (SVM)
Decision Tree
K-Nearest Neighbors (KNN)

Approach:

Scaled features, split dataset (80/20).
Hyperparameter tuning for all models.
Evaluation via accuracy score and confusion matrix.

Results:

All models achieved ~83% test accuracy.
False positives (predicting success when it failed) were common, due to small dataset.

📌 Key Takeaways

It’s feasible to build a predictive model for SpaceX landing success using historical data.
Features like launch site, payload mass, and booster version significantly impact the outcome.
Interactive dashboards and geospatial visualization enhance exploratory capabilities.
Data limitations (size and imbalance) affect generalization — future work could include more recent missions or additional parameters (e.g., weather, telemetry).

🔗 Project Assets

📂 GitHub Repository: Capstone SpaceX Launch Success
📍 Presentation Slides: See full breakdown of analysis here