Hi! Welcome to my personal website. I am a master student in Econometrics & Data Science at the VU. Feel free to take a look around!
During an internship at the Dutch Ministry of Defence, I undertook a project to construct magnetic models for a naval minesweeper. The task involved creating a digital twin of the system yet to be built. Additionally, software was required to optimize this minesweeper given a target magnetic signature. Moreover, the code had to account for the various external influences at sea. This project spanned three months, and in the end, I received a grade 9.0 from my supervisor. Due to confidentiality, no code or paper can be provided for this project.
This project delves into the effectiveness of various dimensionality reduction techniques, particularly Principal Component Analysis (PCA), Sparse PCA (SPCA), and gene-expression specific SPCA (G-SPCA), in classifying cancer subtypes based on gene expression data. The study uses data from four different sources, all related to gene expression in various cancer subtypes. It was found that G-SPCA explains slightly more variance than SPCA, with a positive correlation between explained variance and non-zero loadings. However, traditional PCA outperformed the other methods in classification accuracy for most datasets. The study also highlights the computational efficiency of PCA and G-SPCA over SPCA, emphasizing the latter's impracticality for high-dimensional data like gene expression.
Context is provided by discussing the global impact of cancer and the importance of gene expression data in cancer classification. It emphasizes the role of biomarkers and machine learning algorithms in identifying cancer subtypes and stages. The paper also addresses the challenges of high-dimensionality in gene expression data and the necessity of dimensionality reduction techniques. The research aims to assess how different PCA variants affect the performance and explainability of classification models for gene expression data.
This study contributes to the understanding of dimensionality reduction in gene expression analysis, offering insights into the most efficient techniques for cancer subtype classification. It underscores the balance between computational efficiency and classification accuracy, making it a significant reference for researchers and practitioners in the field of cancer genomics and bioinformatics.
In this project, we tested the impact of hyperparameter tuning on the Non-dominated Sorting Genetic Algorithm II (NSGA-II) in training a multi-objective game-playing agent. We compared a baseline EA with default parameters to an optimized version using the open-source framework Optuna. The trained agent was able to defeat more enemies than on which it trained. Moreover, the NSGA-II algorithm appeared to be robust and not significantly impacted by parameter tuning.
Machine learning techniques such as Long Short-Term Memory (LSTM) and random forest were trained on recent high-frequency price data to predict future Bitcoin prices. Combining the predictions of these two models in an ensemble led to more accurate predictions and profitable trading strategies, despite the notorious volatility of Bitcoin prices.
In our project "Learning to Rank Expedia Searches", we developed an algorithm to improve hotel rankings on Expedia, focusing on displaying the most bookable options. This effort was inspired by the "Personalize Expedia Hotel Searches - ICDM 2013" Kaggle competition. Our approach incorporated advanced tree-based modeling techniques, like Gradient Boosting Machines and LambdaMART, applied to a challenging dataset of over 4.9 million rows.
We adeptly handled issues like missing data and dataset imbalances during preprocessing and feature engineering. In the modeling phase, we employed techniques including K-nearest neighbors, Random Forest, and particularly LambdaMART in LightGBM, tailored for ranking tasks and evaluated using Normalized Discounted Cumulative Gain (NDCG).
A significant success of our project was the effective use and tuning of the LambdaMART model, ensuring its optimal performance. The real-world deployment of this model was a key part of our work, taking into account the need for regular updates and system integration.
This collaborative project led us to achieve an impressive 11th place out of 203 participants. It was a valuable learning experience, enhancing our expertise in machine learning and data management. Our project demonstrates our ability to blend sophisticated data analysis with practical applications, making us well-suited for impactful data science roles.
In this hobby project, I created a Christmas village using smart lighting that plays music and light sequences. This project utilizes a webcam and the OpenCV library to detect when people are walking by in the living room, increasing the likelihood of a sequence being played.
Furthermore, the system can recognize hand gestures. Currently, only the thumbs-down gesture is implemented, which forces the system to play a random new song or start playing a song if nothing is currently playing. The system operates on a laptop running a Vixen server, which communicates with an Arduino over serial, and this in turn communicates with the smart LED strip. Python scripts are run on the laptop to process the webcam streaming data using OpenCV and to send requests to the local Vixen server in order to play, pause, stop, or skip sequences.
Overall, it's a very festive project. The GitHub page provides a video of the system executing a sequence.
My study in mathematics, covering areas like linear algebra and dynamic systems, has sharpened my problem-solving skills, allowing me to approach complex challenges with analytical rigor and creativity.
With a strong background in statistics from my econometrics coursework, I've honed my ability to distill complex data into actionable insights, balancing technical expertise with practical application.
My proficiency in Python and R is rooted in diverse projects, ranging from developing TensorFlow machine learning models to crafting interactive dashboards. This versatility underscores my commitment to staying at the forefront of coding innovations.
In several computer science courses through full stack development, I've acquired a comprehensive skill set in Apache, SQL, PHP, and Flask, equipping me to create robust, user-centric web solutions that are both efficient and aesthetically pleasing.
Through hands-on Arduino projects, I've cultivated a solid foundation in C++ coding for microprocessors, blending my interest in hardware programming with a knack for innovative problem-solving
My extensive use of Git in various projects has not only bolstered my version control capabilities but also enhanced my collaborative coding practices, ensuring high-quality and efficient project outcomes.