Wharton Data Science Academy

Abhay Sri
Aug 2, 2021
2 min read

This summer, I was fortunate enough to attend the Wharton Data Science Academy. It truly was an amazing experience, and I felt like I learned a lot from it. So, what is the Wharton Data Science Academy? Well, to summarize, it's a rigorous course that accepts 75 people internationally that would like to learn R Programming, a popular language used by data scientists (and one that I've covered in previous blogs), and the nuances of data science. At the end of the course we conducted a final research project and presented our findings in front of the rest of the people attending the course. Due to COVID-19, the course ended up being virtual.

I started the course as a team member. Because I was the only one in my team with previous experience with R, however, I basically led the others during exercises and helped them answer any questions. The TA's noticed my previous experience and asked me to switch and lead another team that was struggling. By this time I had already bonded with my teammates, but I ultimately chose to move to the other team. My new team had a lot more diversity - there were two people from Asia. One was from China, and the other was from South Korea. It turns out that they had to completely flip their schedule in order to attend the class, which started in the morning and ended in the late afternoon for EST folks.

As time went on I also bonded with my new teammates. They wouldn't talk much and did most of the work individually, so it was a hassle trying to make them work as a coordinated team. However, once they bonded, we were good to go. We started cruising through the homework assigned. Rigorous coursework and quizzes became much simpler. A lot of the things we were learning were relevant and useful no matter what data set we worked with. For example, one of the key skills we learned was Exploratory Data Analysis, also known as EDA. EDA is extremely useful in cleaning and viewing data sets. It's a set of processes and methods that help us visualize data sets and remove null/irrelevant values. In addition to this, we learned a lot more about several packages and how to use them. For example, we learned about knitting and markdown. Not to mention data science concepts themselves - such as backwards selection, lasso regression, forest regression, etc. Working with a team that was active and also wanted to learn was definitely a blast - and all of our hard work culminated in our final project. The programs head, Linda Zhao, recommended that we do something relevant to climate change for our project - so I suggested to my group that we work with pH data sets, as water acidification comes as a direct result of carbon emissions. We ended up choosing the Delaware River in Trenton, NJ as our data set, and preformed statistical analysis on the data we found. We first filtered out the non-significant variables, and then created a model to predict the pH level based on other water quality parameters. You can view our final report, which is attached at the bottom!