Personal Finance Analysis Project
- Adam Romriell
- Jan 28
- 5 min read
Updated: Sep 16
Link to project Repo: https://github.com/TheAdmrial/personal_finance_analysis_app
Time frame: August 2024 - Present day
Background
Like most couples, my wife and I always question where our money is going. We currently bank with a credit union, and whenever we get paid, we allocate a certain amount of money to our various savings accounts. It's like the cash envelope system but digital. (Maybe call it the digital envelope, idk)
In the past, when my wife asked money questions, I would boot up R Studio and try to answer them as best I could. That was time-consuming.
The ultimate vision for this project is to automate all the steps involved in gathering, cleaning, storing, and visualizing data without manual cleaning or tweaking.
I also tried to stream myself making this project from August to December. I decided against this for a couple of reasons.
First, focusing on the work and interacting with people who were chatting with me was hard. I realized it would take me forever, and I wanted to complete this project to answer my wife's questions about our finances.
Second, based on the advice of other content creators' videos, I decided to post on LinkedIn and YouTube rather than spend time on Twitch. For many reasons, these content creators recommended that I spend my time on the former rather than the latter.
Disclaimer
At the top, I've never done an end-to-end analysis project and bit off WAY more than I could chew. I thought this project would take a couple of months. But everything that I wanted this project to be, I realized at the end of 2024 that this was going to take more work to achieve the ultimate vision for this project.
I think this is the most important lesson I learned: answer the question first, then add more to help answer it. In data analysis, answering the question is the priority, not the fancy shiny things, to eventually get to the answer.
This project includes some Python, R, and SQL code. The Python and SQL code is my attempt to automate the data-cleaning step. I switched to R to answer the original question. I've scrubbed the financial data on purpose, and there should be some general or made-up numbers where the actual numbers should be.
Gathering the data
For the time being, this is all manual. I gathered CSV files from our bank and credit card from the last year of activity. Anyone can do this with their current bank account or credit card information. If you do this yourself, make sure you check the CSV option. I think it's the best option for working with data in R. Make sure the data gets to the data folder.

Initially, the plan was to create a database, which will serve as the basis for the dashboard. However, we'll have to update the financial files with the latest data for now.
Clean the Data
First, the task was to homogenize the data from two different sites: the bank and the credit card company files. The columns they have in common are date, amount, and description.

I also wanted to add which account the transaction came from. That should help to answer some questions down the line about how to categorize certain transactions.

Credit cards do an interesting thing when looking at your statements. They make it look like you're gaining money with transactions and losing money by paying it off. Therefore, I multiplied by -1 to switch the logic.

Now that the data is combined, it is time to categorize and add the company to each transaction. As you can imagine, this was a tedious process. Most of the categorization can be done by just looking at the description column. I will show examples of times when using the date and/or amount is needed as well.
First I need to get rid of the transactions I don't need. My wife and I bank at a credit union we can open many sufix savings accounts and follow a sort of 'digital envelope' method. A certain amount is transferred every paycheck our rent, utilities, groceries, gas, etc. Once those accounts run out of money, we can't spend any more on those categories.
Throughout the script, I used a str_like (string like) function that works similarly to a LIKE clause in SQL with plenty of wildcards to help identify what goes where.

Here are some basic examples for the haircuts and car maintenance categories:

Here are some more complex examples from the Baby Fund category

The whole cleaning script is just over a thousand lines of code. I'm sure I can streamline this data-cleaning process more. My primary concern is to analyze the data and get answers to our finance questions. I'll have a next steps section where I'll talk about future plans for this project at the end.
Store the Data
Let's talk about storing the data.
For now, the cleaned data will live in a CSV file on my computer. I have a database structure set up to store it long-term. I'll explain that structure now.

Since I don't have much experience with star schema, I thought this project would be a great place to learn. 'Transactions' is my fact table with 'company' and 'transanction_type' as my dimension tables.
The eventual plan is to house all the data here and then have Power BI sit on top of this database. I thought this would be a great place to learn and practice connecting a database to Power BI.
This part of the project is on pause while I actually analyze the financial data that I have. The code to build this database currently lives on GitHub if you'd like to take a look there.
Visualize it
Now for the important and fun part! Visualization!
I love visualizations, and I'm very communication-focused when it comes to data analysis. At the end of the day, a client wants to know the answers to their questions. Once a dashboard is set up correctly, it's easy to slice, dice, and look at specific sections under certain filtered conditions. As new questions come up, you can filter to the right criteria and see the answer in a few seconds, rather than make a new graph.

Next Steps/Future Improvements
1) Continue adding and cleaning our monthly data
2) Create a UI to categorize new transactions as the data is being cleaned
3) Load all the data into a database for faster data refresh on the dashboard
4) Add in our monthly budget to show if we were under or over budget for the month
5) Refine the metrics that we use to gauge the health of our expenses
Comments