Syed Hasan



  • Python, C++, MySQL, NumPy,Scikit-learn,Pandas, NLTK, Keras, PySpark, MLlib, Matplotlib, Seaborn, Dimple.js, Javascript, HTML, CSS, Beautiful Soup, Flask, Git, GCP, AWS EC2.


2017–2018:Udacity Nanodegree in Data Science

2007–2016:University of KentuckyPhD. (ABD)inPhysics

  • Performed regression analysis on radioactive source data from silicon detectors using Python and Root, for detector calibrations at the Los Alamos National Lab.
  • Set up, calibrated, modified instrumentation for and performed the UCNB (Ultra Cold Neutron B) experiment, that resulted in improved position sensitivity, noise to signal ratio, resolution, fast timing of the detector system.
  • Wrote code in C++ to perform simulation of experimental conditions.
  • Author of 5 peer-reviewed publications and 3 invited talks.

2007–2010:UniversityofKentucky M.S. inPhysics

  • Physics Merit Scholarship and Gold Medal recipient.


2019-Present: Data Science Consultant,

  • Predicted the fraction of client's customer loan that would be delinquent by the end of the loan term, using polynomial regression with a mean squared error of 2.4e-8.
  • Classified client's customer churn, using logistic regression in MLlib and PySpark.

2019-Present: Data Science Fellow , SharpestMinds

  • Product Sentiment Analysis Web App (ly/twittsense): Implemented an NLP research paper into a Web App. The App collects live tweets about a product and determines an overall positive or negative sentiment expressed towards that product using the Twitter API, Convolution Neural Networks and LSTM.
  • Recommender System: Made a movie recommender system, based on 20 million movie reviews, using collaborative filtering in PySpark.
  • Spam Filter: Built an NLP pipeline to identify Spam text with 93% accuracy.


  • Wrangling OpenStreetMap Data: Extracted, audited, cleaned and transformed data, about my city of Lexington-KY, in XML format to CSV and then queried the data using SQL and produced a visualization of interesting facts like most popular cuisine, amenities and shops.
  • Identifying fraud using Enron data: Identified fraudulent employees at Enron using various machine learning techniques and algorithms with the final model yielding 89% accuracy and 65% precision despite having 45% data missing.
  • Forensic Cluster Analysis: Identified number of hackers using K-means clustering on session meta data, in PySpark.
  • Identifying chemical causing food spoilage: Using random forest classifier, identified chemical most likely to cause food spoilage.