2026Python, pandas, SQLAlchemy, SQLite
CreditStream
Overview
A Python ETL pipeline that processes 50,000 real-world loan records — extracting raw CSV data, transforming it into clean structured data, and loading it into a SQLite database.
Technical Details
Ingests a 50,000-record loan portfolio dataset spanning 10 industry sectors including Technology, Healthcare, and Real Estate. Cleans nulls, fixes data types, engineers a risk tier column based on credit score, and calculates an expected loss ratio per loan. Uses SQLAlchemy to load the cleaned dataset into a local SQLite database. Built around a modular Extract → Transform → Load architecture with a pipeline summary report on completion.
Technologies
PythonpandasSQLAlchemySQLite