Cafe Sales Data Cleaning and Analysis

October 2025

Back to Resume | Back to Blog | GitHub | View Results

Why I Took On This Project

Data cleaning and analysis are critical skills for cloud and data engineering roles. I chose a messy 10,000-row cafe sales dataset from Kaggle to practice cleaning, querying, and cloud deployment using Python, SQL, and AWS RDS. This project showcases my ability to handle real-world data challenges and deploy solutions in the cloud, aligning with my transition to IT and cloud security. See more at trisoncloudresume.com.

What I Built

I cleaned a Kaggle dataset (Cafe Sales - Dirty Data) with 10,000 rows, addressing missing values, duplicates, and format issues. The cleaned data (8,733 rows) was loaded into SQLite and AWS RDS (PostgreSQL) for analysis. Advanced SQL queries analyzed sales trends, visualized in an HTML page. Key components:

What I Learned

This project deepened my understanding of data engineering and cloud database management:

Challenges and Solutions

The dataset and cloud setup presented several challenges:

Why It’s a Win

This project demonstrates my ability to clean and analyze large datasets, deploy cloud databases, and visualize results. Starting with a 10,000-row messy dataset, I reduced it to 8,733 clean rows, ran advanced SQL queries in SQLite and AWS RDS, and created a polished HTML visualization. The process honed my Python, SQL, and AWS skills, aligning with junior cloud or data engineering roles. Check out the results at cafe_sales_analysis.html and my portfolio at trisoncloudresume.com.