$

hi, i'm

kyle zhao

data science · uc san diego

Learning about artificial intelligence to better our lives.

scroll

projects

League of Legends Esports: ADC Gold Lead Impact Analysis

featured

Statistical analysis of professional League of Legends match data to determine whether ADC players with gold advantages at 15 minutes significantly impact win probability. Collaborative UCSD DSC 80 project analyzing 10,000+ competitive matches from Oracle's Elixir dataset.

  • Analyzed 150,000 rows of professional esports data using hypothesis testing and permutation tests (p < 0.001)
  • Built Random Forest classification model achieving 75% accuracy with feature engineering and GridSearchCV tuning
  • Demonstrated teams with ADC gold leads achieve 65-70% win rates through statistical analysis
  • Validated model fairness across game scenarios using permutation testing (p = 0.156)
PythonPandasScikit-learnRandom ForestHypothesis TestingPlotly

Utilizing Pandas to Analyze "Friends" for a Reboot

Exploratory and statistical analysis of the sitcom Friends to identify patterns in viewership, dialogue distribution, and emotional range to inform a potential reboot strategy.

  • Analyzed episode-level data including directors, writers, and viewership trends
  • Applied hypothesis testing and regression analysis to examine IMDB ratings and audience patterns
  • Used NLP techniques to analyze dialogue distribution and generate episode titles in the style of the original show
PythonPandasNLPStatistical AnalysisRegression

Diabetes Risk Prediction Using Medical Records

Built a logistic regression model to predict diabetes onset using medical record data, with a focus on data cleaning, visualization, and interpretability.

  • Cleaned and preprocessed medical data by handling missing and invalid values
  • Visualized feature distributions and correlations using histograms and scatter plots
  • Achieved 75% accuracy on a held-out test dataset using logistic regression
PythonPandasScikit-learnMatplotlibLogistic Regression

Image Classification with CNNs on CIFAR-10

Collaborated on developing a convolutional neural network to classify images from the CIFAR-10 dataset, focusing on model optimization and performance evaluation.

  • Implemented and trained a CNN using TensorFlow
  • Experimented with hyperparameters to optimize validation performance
  • Achieved 79.74% test accuracy on CIFAR-10 image classification
PythonTensorFlowCNNsMachine Learning

contact

open to internships, collaborations, and conversations.

typical response time: 24–48 hours
for urgent matters: reach out via linkedin

about me

Hello! I'm Kyle!

I'm a second-year studying data science at the University of California, San Diego. My coursework revolves around data analysis and statistics, but I like learning about many things outside of data science including but not limited to: nutrition, human psychology, kinesiology, entrepreneurship, software engineering, and environmental sciences.

I was born and raised in Princeton, New Jersey, but for now I am studying in San Diego, California. In the future, I envision myself operating out of a city like San Francisco or New York.

In my free time, I like to work out at the gym, cook nutritious food, make social media content, play table tennis, and vibe-code! I'm always willing to try out new things, so this list may grow in the future!

I'm currently looking for ways to expand my breadth of knowledge and experience through research and internships, so please feel free to reach out if you have any related opportunities!

skills

PythonSQLTypeScriptScikit-learnTensorFlowPandasNumPyMatplotlibJupyterGitGitHub

Back at home, I have a golden-doodle called Minnie — always "hungry."

Kyle with Minnie