This summer Auquan won it’s first London challenge, an inter-university event between the best of London has to offer. Students and societies from Imperial, Kings, LSE and UCL all vied to win the inaugural London Summer Challenge crown.
Whilst the different factions of London’s university scene worked hard to take the lead, our winning team had a different plan.
Raymond and Rui, from Imperial and UCL respectively, decided to team up and create a team with the combined power of the top universities in London, if not the world. This combined force showed as the convincingly won the competition with an impressive solution.
We catch up with them after the fact, take a read below at what they had to say.
Tell me a bit about your background:
Raymond: 3rd year turning 4th year physics student at Imperial College London. We saw the competition really late and were both working at the same internship over the summer, so decided to work together.
Rui: I’m studying a Meng Computer Science at UCL, soon to be starting my 3rd year (in September). As well as the internship Raymond mentioned, I am also working as a machine learning engineer for Weza Ventures in Nairobi Kenya. It is part of the Royal Academy of Engineering’s exchange programme. I am currently developing an interactive automated machine learning application to predict credit score for user-provided data.
Oh nice, Raymond, what does your company do?
Raymond: Its a fintech that is building a trading execution platform and our goal is to use the data from clients to improve the efficiency of the platform. For my project I’m looking at an automated trading execution protocol, using machine learning to try and predict the best execution rules for clients to trade on.
Raymond: Rui was working on smart dealer selection. When you trade you have different dealers that you could include and we are trying to predict who the best dealer will be and what their price will be.
Raymond: Eventually, we hope to combine our two final projects together to improve the overall platform experience for clients.
How long have you been involved in this internship?
Raymond: Just over the summer this year. (Approx 2 months).
How did you start learning and working on data science skills?
Raymond: I think the reason why I chose data science is that for Phyisics lab you have a lot of data generated in your instrument. To better understand the information you collect and to use it to prove your hypothesis you end up using a lot of advanced data techniques. All those techniques I found really interesting, so that drove me to this field. I’m also quite interested in AI and machine learning topics.
Rui: I did a year-long machine learning project in my second year as part of the degree: predicting star rating and helpfulness of review for Ocado technology. I really enjoyed it, so I decided to specialise in this area!
And how long have you been doing data science projects?
Raymond: About a year, I started my first project with Blackrock — using sentiment analysis to predict stock price change. Then after that, I did some data competitions from top investment banks. Then obviously things like Kaggle.
Rui: I believe the best way to stay up to date or learn is by doing projects to solve real-world problems, so I took two data science internship this summer.
Rui: I haven’t really done many competitions before. I have participated in a few hackathons and google hashcode though.
Is it specifically finance projects that interest you, and what about them do you find interesting?
Raymond: I tend to work more closely towards the finance industry, but not really. Umm, I’m not sure, I guess… In a Citadel project, I did they give information on water quality and social factors for US states and you found that students and the middle class moved out towards the coast whilst the working class are starting to move into middle America. What I found amazing was being able to tell a story with the data that no one else was able to.
That sounds great to me, what is it about all these problems that makes you decide to take part?
Raymond: I haven’t actually tuned down any problems yet. I can’t really say what sort of problems I don’t want, so long as the organisers make the data set very clear and have a direction for users to go then I think it’s worth checking out. Otherwise, say you have some random data and not really any direction, it’s kind of impossible to create an analysis out of nowhere.
And why do you chose to do coding challenges, there are other ways to get involved with data science?
Raymond: For one, I really enjoy coding. Secondly, as I’m not doing a computer science degree I want to get experience and qualifications in coding and data science. I think taking part in competitions is a good way to show my initiative and interest.
Rui: I like hackathons because they don’t take very long. Everything is done in just 24 hours!
Are there any other things you do?
Raymond: My interest in data is somehow related to finance, so I’m also really into algo-trading and I’m part of an algo-trading team at Imperial. We are currently building our first-generation trading robot that is going to run on a virtual platform.
If we move on to your approach for this competition can you explain how you tackled the problem?
Raymond: So immediately from the copy, it was clear that part one was a feature selection problem. I’ve done a couple of these problems before so I started by looking back through my previous work. There are two main approaches to feature selection: correlation analysis and model-based selection and looking at your template file I saw that you had provided a correlation analysis approach. So we decided to avoid this approach and our answer is completely based on model-based selection.
Raymond: I used a lot of different models together and then ensembled them together at the end. The reason I did this was, how do you say, to balance out the bias of the data set. In each type of model you have a different bias, for example: In LASSO, you have an L1 regulation problem — so you need to reduce the number of parameters. Whereas in a random forest you have an ensemble tree model, where if you set the depth of the tree too deep you will overfit, but too shallow you will overlook some features. My reasoning was to put all these models together that would lead to a more balanced answer.
Raymond: It isn’t too hard to do as long as you put in work upfront and make sure you have the correct logic and know which models you want to put into the ensemble. Then you just code it out!
Rui: Basically that. Research, implantation, and testing!
How did you split up tasks and work within your team?
Raymond: I am more familiar with these models, so I was in charge of the model selection process and Ray used his computer science background to implement them with testing. So, we found that separating the task worked.
Raymond: Honestly, we could both do all the tasks ourselves. We just thought that because we didn’t have much time that this would save time and meet our deadline.
Would you give any advice to other teams?
Raymond: I’ve worked on a couple of problems similar to this in teams and my advice would be: Don’t try and put two people in the same direction at first. At the start, you are trying to make each of your teammates go in different directions, before bringing it back again at the end. Otherwise, the arguments will never stop.
Rui: Just make sure that everyone knows what they need to do before starting to work on the problem.
Whilst we are on the topic of giving advice, is there any advice you would give people looking to get started with financial data science?
Raymond: I don’t know if it’s good to ask me that! I’d recommend kaggle as your first challenge. (David: Kaggle kernels are a great place to learn general data science skills. When you’ve learnt something new come back to us to test it our the quant world!). I would do this because of the kernel feature where you can see other people’s notebooks and solutions for a particular problem. So they can share some knowledge and whilst you are doing the problem you can learn from what other experts are doing in their solution. This is actually how I started.
Finally, what are your aspirations for your data science career?
Raymond: I’m thinking about doing a masters or PhD in data science or a related computer science major. I kinda want to have a very systematic approach to learning about this field and understand the fundamentals and how everything fits together — like a skill tree. After that I would like to work in a big tech or finance company but we will see how it goes.