R/RStudio

  • Conducted a thorough Bayesian analysis of over 19,000 artworks to explore relationships between material usage, art historical periods, and gallery locations using multilevel logistic regression, random effect, and clustering models. Reshaped and cleaned the dataset, and was able to validate historical theories through data analysis, undertaking a not often seen crossover between data science and the humanities as a solo project.

  • Consulted for UCLA’s English Department regarding the Informed Placement Process (IPP). Used logistic and linear models with a multitude of model fit diagnostics to explore the impacts of academic self-confidence and demographics on English Language Placement.

  • Conducted an in-depth analysis of McDonald's menu items to evaluate nutritional value and cost efficiency. Additionally explored trade offs between affordability and health at McDonald’s using linear regression and visualizations.

  • Built a predictive Random Forest classifier to predict probability of obesity status from a dataset of 32,000+ patients. Utilized backward BIC for feature selection, and tested a variety of models before finalizing Random Forest.

  • Utilized an online simulator to conduct a Randomized Complete Block Design (RCBD) experiment regarding the effect of exercise on memory retention while ensuring ethical guidelines were followed.

  • Created a predictive model using information regarding U.S. GDP and economic indicators from 1966 - 2023 (including median housing prices and personal consumption/federal expenditures).

  • Conducted an in-depth analysis of McDonald's menu items to evaluate nutritional value and cost efficiency. Additionally explored trade offs between affordability and health at McDonald’s using linear regression and visualizations.

Python

  • Developed a multi-class galaxy classifier using on Galaxy Zoo data, utilizing oversampling and feature scaling to address class imbalance and improve model performance, further tuned a neural network architecture.

  • Conducted sentiment analysis on a 50,000-review IMDB dataset, involving training and evaluating multiple classifiers (Logistic Regression, Naive Bayes, SVM) to achieve high accuracy and optimizing hyperparameters.