Essays


Send to Kindle

Me as a data bee

Introduction

Data Science is getting a lot of attentions. An article from Harvard Business Review called Data Scientist to be "The Sexiest Job of the 21st Century". 1 Meanwhile, many who are in academia frown their faces, thinking it's just statistics. There is also a valid question if Data Science is really a science. It seems much of the excitements are from the industry, and the skepticisms are from from the academia. I felt a little silly to declare myself to be a Data Scientist at first especially during a hype. But when I take a moment to look deeper, I see Data Science covers something I've been aways passionate about. It talks about statistics, coding, machine learning. all focused to solve the real world problems. Visualization also matters so that value of the information is perceived by people effectively. So, I determined to re-define myself as a Data Scientist. I hope that I will make a good one. Because it will be a long journey, I thought it might be a good idea to check where I am starting from by writing a series of blog posts to reflect my technical background related to Data Science.

First post is about statistics.

Part 1. Statistics

I am not a statistician. I took basic statistics courses in college, and used the knowledge in the course of the graduate research, and I am always a student of more advanced statistical topics. One interesting thing to note though is that the area I used the statistical knowledge most was in high frequency trading. One day, I took on the hobby of currency speculation. This hobby brings either monetary profit or loss because I buy and sell foreign currencies based on the speculation of they gain or lose the value against the dollar. If my speculation is correct, I make money, or I lose it otherwise. I wanted the process of the investment decisions to be as quantifiable as possible.

There are two components in the strategy: timing and allocation. Timing is about when to enter the market (by buying or short-selling the currency) and when to exit (by doing the reverse operation). I used a technical analysis called Elliott Wave 2, and made the preset rule that I use consistently on every trade. Each trade generates data including how much I gained and how much I lost. This data is used to decide "allocation" for future trades. By allocation, I mean how much of my total capital are to be exposed to the risk. Bigger the position I hold, the bigger portion of the capital is at risk. Using the trading results, I created the probability density function of the returns on the recent trades. Then I apply so called Kelly Criterion 3 to calculate the per-trade bet size that maximizes the long term return.

As an applied statistics, it is far from being perfect. For example, It assumes my future odds are consistent with the past performance. But it was good enough to make over 20% return on investment per year with very small fluctuations of capital. More importantly, it reduced the uncertainty in the decision making process including emotional, compulsive mistakes.

Original post: April 22, 2014 | Last updated: April 22, 2014

Previous: Sharing what I make & use: Code that runs this web site
Next: Value-focused, rapid, open engineering
Read more

comments powered by Disqus