My first project through Flatiron School’s Data Science course is done, and I am extremely proud of not only with what I have done, but what I have learned!
When I was given my first project, I felt pretty overwhelmed by what was being asked of me: help Microsoft (hypothetical) better understand the movie industry, explore what type of films are currently doing the best at the box office, and translate my findings into actionable insights. However, as I look back, I just needed to get started and set milestones for myself in order to keep myself motivated until I reached the finished line.
One of my first milestones was to just look at the data. After looking at the data that was provided by Flatiron, I decided I wanted to incorporate data from another source to get a real feel for what other data scientists must do if data is not provided for them. I did a quick Google search and stumbled upon the research article, Predicting Movie Box Office Profitability: A Neural Network Approach. From this article, I found out I could get data from OMDb API which has data from Metacritic, Rotten Tomatoes, IMDb, The Numbers, and Movie Internet Database. So, I learned how to obtain an API key, read the Python documentation for it, and incorporate it into my code.
(Source: http://www.omdbapi.com/)
It felt so rewarding to learn how to conduct requests through an API system that researchers had also used.
After extensive data cleaning and wrangling, I had finished my project and created all my plots that I needed to complete my project. One of the most helpful graphs I have discovered and used throughout my whole project was the Violin Plots. These plots are very similar to box plots, but they give more detail on the frequency of where data points occur which is very important when doing comparisons with other subsets of data like the R.O.I. distributions of rated ‘PG-13’ vs the R.O.I. distributions of rated ‘R’ movies. (Source: autodeskresearch.com)
I will definitely be using these more in the future.
When I finally finished the coding and commenting of the code, I thought I was done with coding for this project; that is, until I read the rubric for the project. Apparently, I needed to create a way to organize my python files to make it more navigable. My first thought went to a Table of Contents but was making a Table of Contents that linked to numerous Jupyter Notebooks possible? Yes, it is with a little bit of Google searching. I found this article very helpful, Markdown Cheat Sheet. Additionally, I had to create a README.md. No sweat right? Maybe Github has a template for you to easily fill out. WRONG. The previous cheat sheet I cited helped some, but I had to look at an additional source to figure out how markdown cells work in Jupyter Notebooks. After Google searching and watching tutorial videos, I was able to press out a solid looking README.md. Here is a resource I used that helped immensely: Markdown for Jupyter notebooks cheatsheet
Taking into account all the learning I have done with the project, I am excited to see what learning experiences await me in future projects. In the end, I know all my cumulative experiences will result in me becoming an expert data scientist. If you would like to check out my project here it is: