Saim Mehmood

MSc in Computer Science | Data & Software Engineer

About Me

My background is in data mining and machine learning with applied research experience working at the Data Mining Lab, York University. My most recent experience was working as an MLOps Engineer at a Google Cloud Platform (GCP) consulting company where I leveraged GCP ML stack to deliver end-to-end machine learning models. I’m certified as a GCP Professional Machine Learning Engineer.

I have experience working with Geo-Spatial Data sets for which I used PostgreSQL & PostGIS and Python to extract insights. I typically visualize data sets along the way to understand the pipelines and models I’m creating.

I conducted research as an Erasmus Scholar @ the Department of Information Technology, Uppsala University, Sweden in the area of Cloud Computing.

During my graduate studies at York University, I was working under the supervision of Prof. Manos Papagelis.

Interests: data mining, machine learning, data science, natural language processing (NLP), cloud computing

Technical Knowledge and Skills:

• Data Mining & Visualization - Python (matplotlib, seaborn, pandas, numpy), PySpark, Power BI/Tableau

• Programming Languages - Python, Java, C#, C/C++

• Databases - PostgreSQL/PostGIS, MongoDB, MySQL

• Google Cloud Platform (GCP) - Vertex AI, BigQuery, Dataflow

• NLP & Machine Learning - spaCY, NLTK, Tensorflow/PyTorch/Keras

• Operating Systems - Linux/Mac, Windows

• Medium Profile

• Stackoverflow Profile

Experience

N. Harris Computer Corporation

Technical Consultant

Apr 2023 - to Date

• Developing an understanding of utilities industry (water, gas & electricity) by digging into advanced metering infrastructure (AMI) and meter data management (MDM) systems.

• Learning in-depth about the SmartWorks MDM tool and its modules geared towards water and electricity data analysis i.e., water loss analysis, leak detection and transformer analysis etc.

• Working with various clients towards understanding their requirements and implementing customized solutions based on them.

Badal.io

MLOps Engineer - GCP Consulting

Apr 2022 - Jan 2023

• Worked on building end-to-end ML models (training, deployment & evaluation) using AutoML and custom model training by leveraging Vertex AI, BigQuery, Docker and Dataflow for a client in financial space. Improving their ability to train ML models more efficiently using GCP.

• Developed proof-of-concepts (POC’s) for real-estate client to detect bias in appraisal documents, leveraging Document AI and Cohere API’s.

• Obtained my certification for Google Cloud Professional Machine Learning Engineer.

Certificate URL: https://bit.ly/3WhLBzg

EECS, Lassonde School of Engineering, York University

Graduate Teaching Assistant

Jan 2018 - Dec 2022

Taught multiple courses at the Electrical Engineering and Computer Science (EECS) department.

• Courses: Programming for Mobile Computing EECS 1022 - Introduction to Database Systems EECS 3412 - Object Oriented Programming from Sensors to Actuators - EECS 1021 - Software Design - EECS 3311

Tasks include: directing tutorials, exam invigilation, final and midterm exam review sessions, grading assignments/exams, office hour duties. OOP, Java, Android Studio, IntelliJ

Data Mining Lab, Lassonde School of Engineering, York University

http://dminer.eecs.yorku.ca/

Graduate Research Assistant

Jan 2018 - Apr 2020

• The research was related to trajectory data mining, machine learning, and statistical inference.

• Developed a method that utilizes trajectories of [cars, pedestrians, etc.] as a way to infer semantic similarities between geographical areas.

• Published the research titled Learning Semantic Relationships of Geographical Areas based on Trajectories at the IEEE Mobile Data Management Conference 2020 Versailles, France, and received Best Paper Award.

Swedish National Infrastructure for Computing, Uppsala University

Erasmus Scholar

Aug 2015 - Jan 2016

• Designed a framework inside SNIC using Apache Spark, SparkR & Jupyter Notebook to simplify computations of highly parallel scientific applications.

• Our project titled Towards Moving Scientific Applications in the Cloud enabled researchers to seamlessly deploy their applications on the spark server and scale it to multiple worker nodes as needed.

Notable Projects

Learning Semantic Relationships of Geographical Areas based on Trajectories

https://github.com/saimmehmood/semantic_relationships

Python (networkx, pandas, numpy, seaborn, matplotlib), PostgreSQL, PostGIS, MATLAB, Google Cloud (Places & Directions) API

• Developed a framework to understand semantic relationships between geographical areas based on object movement paths i.e., trajectories. (Best Paper Award for IEEE Conference on Mobile Data Management 2020)

Expert Developer Recommendation Using Very Large Datasets

https://github.com/saimmehmood/ExpertDeveloperRecommendation

SQL, Google BigQuery, Elasticsearch

• Built a search engine to find expert developers by utilizing GitHub datasets. • Reduced 3TB of data into merely 600 MB by keeping developer specific information such as (number of commits, first and last commit, average time between commits etc.)

Cloud Computing, OpenStack, Apache Spark, Jupyter Notebook

• Cloud computing provides usability, scalability and on demand availability of computational and storage resources, remotely. These are the characteristics required by scientific applications and that’s why we used it. The project had two dimensions. First one addresses the benefits of cloud infrastructure for end users. In the second portion, we tried to do performance analysis.

PostgreSQL, PostGIS, Python (numpy, pandas)

• This experimental project was done as a use-case to predict COVID-19 infection hotspots for a probable second wave of cases in Manhattan area.

Education

York University

MSc Computer Science

Jan 2018 - June 2020

York University is a public research university in Toronto, Ontario, Canada. It is Canada's third-largest university, and it has approximately 55,700 students, 7000 faculty and staff, and over 315,000 alumni worldwide.

My studies at York University were focused on extensive research. I accumulated a wealth of knowledge in the area of Data Mining, Big Data, Data Science and Machine Learning. I published research track paper with my supervisor Manos Papagelis, titled Learning Semantic Relationships of Geographical Areas Based on Trajectories for The 21st IEEE International Conference on Mobile Data Management 2020. Our paper received Best Paper Award.

Notable Courses: Data Mining, Mining Software Engineering Data

University of the Punjab

Bachelor of Sciences in Software Engineering

Sep 2012 - Jul 2016

University of the Punjab is a public research university located in Lahore, Punjab, Pakistan. It is the oldest public university in Pakistan.

Four years of undergrad at University of the Punjab helped shaped my understanding of cloud computing, software development, and its requirements engineering. During the course of my studies, I earned an Erasmus Mundus scholarship to spend an exchange semester at Uppsala University.

Notable Courses: Applied Cloud Computing, Software Requirements Engineering, Database Systems

Certifications

 Google Cloud Certified Professional Machine Learning Engineer

https://www.credential.net/42cc4d30-75be-410c-8486-a94afbe73eff

 Introduction to Quantum Computing

http://www.linkedin.com/learning/introduction-to-quantum-computing

Certificate No: AY7IBy3zoehD_C4j4fc-gqdE_brr

Volunteer Experience

SharpestMinds

https://www.sharpestminds.com/

Data Science Mentor

Jul 2022 - to Present

• Helping recent graduates and folks from different backgrounds to transition into Data Science.

• Mentor Profile: https://app.sharpestminds.com/mentor-bio/saim-mehmood

Aggregate Intellect

https://ai.science/

Research Fellow

June 2021 - June 2022

• Open sourcing NLP (natural language processing) packages to increase the visibility of aggregate intellect in tech community - https://tinyurl.com/hz7bhrch

SharpestMinds

https://www.sharpestminds.com/

Data Science Fellow

June 2020 - April 2022

COVID-19 - Risk of Geographical Areas being infected - Python, PostgreSQL - PostGIS

• Using my existing research in trajectory data mining, developed a method to identify areas that are at a high risk of being infected by COVID-19 in the NYC Manhattan region.

Calculating Zakat using Python - Python, PyPDF2

• Utilized PyPDF2 to parse bank statements and generate the analysis and calculation.

• Actively contributed to organizing IBM’s CASCON x EVOKE 2019 conference.

Honors and Awards

Alongside my interests in data mining and software engineering I earned some awards:

  • Awarded York University Graduate Fellowship for the entire duration of M.Sc. Computer Science, January 2018
  • Electrical Engineering and Computer Science Graduate Student Association (EECS-GSA) York University, Vice-President Organization, Sep 2018 - 2019
  • Represented Pakistani youth in China, Pakistan Youth Delegation, August 2016
  • Won Erasmus Mundus Exchange Scholarship to spend an exchange semester at Uppsala University, May 2015
  • Winner 17th In-House Speed Programming Competition, University of the Punjab, May 2015