Bojan Stavrikj
Senior Data Engineer
I was part of a project which was a new requirement by the ECB in order to stress test the portfolio of systematically important European banks. I was involved in the data collection, data integration and the data transformation process. Ultimately building an ETL pipeline in order to deliver databases containing the necessary information as per the ECB requirements in order to stress the bank’s portfolio dependent on different climate change scenarios and projection years.
I have completed my thesis project in 2022, which analyzes a global network of investment relationships. Two datasets were used for this project, including: Coordinated Direct Investment Survey (CDIS) and Coordinated Portfolio Investment Survey (CPIS). Network analysis was used to understand the position of different countries in intermediating investments in a global network. These findings can be used to identify patterns, preferential paths for investment, establish trends and describe the relations between countries over time. Ultimately, the results are visualized in an interactive web application developed maily with d3js. The visualizations include complex node-link force directed graph, as well as simpler bar, line charts and tabular representations. The web application is available at: https://fi-networks.com/.
I built a transfer planner and player comparison web application for fantasy premier league managers. The data was obtained through the Premier League and Understat API's. I used python to for data cleaning and aggregation, while for building the app I used mainly HTML, CSS and JavaScript (d3.js). The app allows users to import their team based on ID and show the players that they currently own. The transfer planner section gives the possibility to simulate transfers in a complete one-page view, where all the necessary information is available. Additionally, the users can visually compare the past and expected stats of the players and understand which asset is the most valuable. The web app is available at: https://fplmania.com.
While doing some of the projects outlined in this section, I had the need of obtaining data that would contribute to my models. In one such scenario the team needed some weather data. After several failed attemprs of finding structured files with weather information I realized I should scrape this data for myself. This is when I started learning how to web scrape. After successfuly writing this code for myself, I decided to create a post with the code and explanation so others in similar situations can get this data (avalable in content section).
The company had trouble in determining the right number of drivers to hire for different periods of the year. A predictive model was built using Machine Learning techniques for estimating expected number of services. The challenges faced in this project were data quality and aggregation. Additionaly, the client request for predicting 4 week batches which are 8 weeks in advance made this project more challenging. Although, this was a necessary constraint for the model to be effective due to the long hiring processes. The target set by the client was a maximum mean absolute error of 10%, while we managed to achieve a 5% error on average. The model could further be improved by getting more, and better quality data.
The data used is real and obtained from a hotel based in Lisboa, Portugal. The issue the hotel had was a staggering 40% booking cancellation rate. This is largely contributed to by Online Travel Agencies (OTA's) such as booking.com and airbnb, which often give clients the flexibilty of free cancellations. In order to tackle this problem a predictive model was built which ultimately achieved an 81% accuracy. Having this model deployed, would allow management to better allocate resources and decrease the total number of unvacant rooms at any given time. This could be achieved by allowing for over/under booking flexibility when expected cancellation rates are high/low.
The data used is real and obtained from a hotel based in Lisboa, Portugal. At the time the hotel had implemented a customer segmentation which was based solely on the booking platform the customer used last. Our team obtained data on a large number of clients of this hotel and managed to perform in-depth analysis on the types of customers that book stays at the hotel in question. The end product resulted in 4 segments for the specific static dataset. Lastly our team gave detailed description of each cluster and suggestions on deployment, maintenance as well as marketing strategies for each profile of customers.
The data used is synthetic data with regards to an automotive company and has 110 different models for sale grouped in 7 major product lines: classic cars, vintage cars, motorcycles, trucks, planes, ships and planes. The records of sales orders are available for the period from January 2003 to May 2005. The goal of the project was to develop a data warehouse with star-schema including fact table and several dimensions. This was done in mySQL, and later connected with Pentaho in order to schedule jobs for running the ETL processes. Lastly, a dashboard was created using PowerBI for clear overview for the sales department of Classic Models.
I developed a new controlling tool for the team. It spread over more than 10 interconnected sheets controlling and reconciling data from 3 different systems on several levels of grain. The challenge was to have all the data in one place while making it run fast without having to go through many formulas to obtain the final result. Data would be inputted over several sheets, from 3 different sources; manual ledger adjustments would be added to the file depending on where the system is breaking and needs to be adjusted. All of this would then be summarized as a simulation of what the final numbers would be after all necessary adjustments are uploaded in the system. This summary was shown on day-to-date, month-to-date and year-to-date basis, including a difference calculation to the initial risk system. After all of this was done, a final system number would be inputed for a final check. Once the trading desk approves and agrees with the final reported numbers, a macro was created in order to distribute the numbers to a list of managers within the company.
I am part of the Stress Testing Financial Synthesis (STFS) team in BNP Paribas, within STFS the sub team of Stress Testing Data Analytics (STDA) where we process data from different data streams in the bank. I work in a big data environment, leveraging on hadoop and Pyspark to process huge amount of data and prepare it for the modelling team in order to stress the portfolio based on different types of shocks. The main project I was part of is a new requirement by the ECB which stresses the portfolio of systematically important European banks. I was involved in the new data collection, data integration and data transformation process. Ultimately building an ETL pipeline in order to deliver databases containing the necessary information as per the ECB requirements in order to stress the bank’s portfolio dependent on different climate change scenarios and projection years.
Within Citibank Product Control (PC) is the largest department in Finance with responsibility for controlling daily profit and loss reporting, price verification and new trading activity for the ICG in EMEA. The department is organised into business-aligned teams and the product scope comprehensive, comprising cash, derivative, as well as structured and exotic variants of the following asset classes; credit, FX, equity, money markets, commodities and rates. I worked closely across functions on a daily basis (including the Trading desks, Risk Management, Operations, and other areas of Finance) and developed a good understanding of the products traded, along with the associated market risks and accounting complexities.
The internship was taken as part of my bachelor degree, with main goal to study the importance of the financial controlling in keeping Magyar Telekom competitive on the market while being at the forefront. While I was there they were developing new strategic packages that are released on the market as a response to competitors moves.
Exploratory Data Analysis
Market Analysis
Business Dashboards
Predictive Modeling
Customer Base Analytics
Data Engineering
Python
Javascript
SQL
PowerBI
Web
Scraping
Machine
Learning
HTML5
& CSS
Pyspark
Major in Business Analytics
Thesis: Global cross-country investment relationships – using network analysis and interactive visualization techniques
Average: 17/20
Major in International Economics
Thesis: Empirical analysis on “The Relationship Between Oil Price Fluctuations and Exchange Rates of Net Oil Exporters”
Average: 6.5/10
Higher Level - Economics, Biology and Theatre
Standard Level - Math, English and Spanish
I am a Macedonian national, currently living in the city of Lisbon pursuing my masters in data science and majoring in business analytics. After living in Macedonia through my elementary phase, I moved to Budapest with my family and lived there until graduating high school and fully picking up Hungarian, and quite fluently.
My pursuit to get out of home and living the college life led me to Rotterdam in the Netherlands, where I graduated my bachelors in Economics at Erasmus University Rotterdam. Fours years in, I decided to move back to Budapest for a position as Junior Product Control Analyst at Citibank acquiring knowledge and experience in Finance. Mostly focused on equity derivatives and corporate equity derivatives.
Fast forward to 2020, I am currently expanding my knowledge in data science at NOVA IMS in Lisbon, getting hands-on experience in supervised and unsupervised machine learning models, as well forecasting, classification, prediction and segmentation methods on data sourced out through established companies.
I am actively seeking opportunities in the felid of Data Science in Lisbon. Feel free to get in touch and we could chat more on how my skills can meet your current requirements in data science or business analysis!
Native
Fluent
Fluent
Advanced
Basic