top of page
Search

🚀 SpaceX Launch Success Prediction with Machine Learning

  • Writer: Alvaro Mejia
    Alvaro Mejia
  • May 28, 2025
  • 2 min read

How I predicted SpaceX rocket landing success using real launch data and interactive dashboards.


Role: Data Scientist

Team Size: Individual project

Duration: 3 weeks

Stack: Python · Pandas · SQL · Plotly Dash · Folium · REST APIs · Scikit-learn · SVM · Decision Trees · Logistic Regression



🧠 Project Overview

This project was developed as the final capstone for the IBM Data Science Professional Certificate. The objective was to analyze SpaceX's Falcon 9 launch data to uncover which factors influence the success of first-stage rocket landings and to build machine learning models that predict landing outcomes.

By integrating data science techniques — from API usage and web scraping to EDA, dashboarding, and machine learning — this project demonstrates an end-to-end approach to solving a real-world aerospace problem using data.


🔍 Problem Statement

SpaceX can significantly reduce launch costs by recovering and reusing the first stage of its Falcon 9 rockets. However, not all launches result in successful landings. Key questions addressed:

  • Can we predict whether a rocket will land successfully based on launch characteristics?

  • Which features (payload, orbit, launch site, etc.) are most influential?


📊 Data Collection & Processing

Sources:

Steps:

  • Data retrieved using Python requests and web scraping (BeautifulSoup).

  • Cleaned and filtered for Falcon 9 launches.

  • Missing values were handled (e.g., payload mass imputation).

  • Binary target variable created: landing_class (1 = success, 0 = fail).


📈 Exploratory Data Analysis (EDA)

Techniques used:

  • SQL queries on launch data for patterns (e.g., payload averages, site success rates).

  • Visual analysis with seaborn, Matplotlib, and pandas.

  • Key findings:

    • Launch site, payload mass, and booster version have strong correlation with landing outcomes.

    • Sites like KSC LC-39A had the highest success rate (~77%).

    • Best results were achieved with payloads between 2000–5500kg and booster version FT.


🌍 Interactive Visualization

Tools: Folium, Plotly Dash

  1. Folium Map:

    • Visualized launch site locations.

    • Overlayed launch outcomes and distances to coastlines and airports.

  2. Dashboard (Plotly Dash):

    • Pie Chart: Displays success rate per launch site.

    • Payload Slider + Scatter Plot: Shows success rate correlation with payload and booster version.

    • Fully interactive — adjust site and payload range to explore outcomes.


🤖 Machine Learning Modeling

Goal: Predict success of Falcon 9 first-stage landings.

Models trained:

  • Logistic Regression

  • Support Vector Machine (SVM)

  • Decision Tree

  • K-Nearest Neighbors (KNN)

Approach:

  • Scaled features, split dataset (80/20).

  • Hyperparameter tuning for all models.

  • Evaluation via accuracy score and confusion matrix.

Results:

  • All models achieved ~83% test accuracy.

  • False positives (predicting success when it failed) were common, due to small dataset.

📌 Key Takeaways

  • It’s feasible to build a predictive model for SpaceX landing success using historical data.

  • Features like launch site, payload mass, and booster version significantly impact the outcome.

  • Interactive dashboards and geospatial visualization enhance exploratory capabilities.

  • Data limitations (size and imbalance) affect generalization — future work could include more recent missions or additional parameters (e.g., weather, telemetry).


🔗 Project Assets


 
 
 

Comments


Alvaro Mejia

Data Scientist | Python, SQL | Machine Learning & Big Data Enthusiast

  • alt.text.label.LinkedIn
  • GitHub
  • Youtube

©2024 by Alvaro Mejia

bottom of page