Kaggle entry experience

In the field of data science, no one is unaware of the famous kaggle, where people learn to communicate and improve their skills. For practitioners, it is best to participate in the prediction task competition to improve themselves. For beginners, be sure to go to the kaggle website to actually participate in a contest, at least I always think so. < / P > < p > after a short wait-and-see, I decided to compete at the end of 2018. In a short time after that, I learned a lot of data science skills that I was not familiar with before, and prepared for the competition. To my surprise, I found that even for beginners, data contests are fun. < / P > < p > for beginners, there will be some problems in participating in the kaggle competition for the first time. I will try my best to uncover the mystery of kaggle for you. All in all, I hope that I can make you feel the charm of kaggle and let kaggle push you forward in the field of data science. < / P > < p > let’s first learn about the kaggle platform from the 101 rookie competition. In a general kaggle competition, you usually receive two sets of data: a training set and a test set. The training set is labeled data, while the test set does not. What you need to do is to write algorithms to predict the tags of test sets. < / P > < p > during the competition, participants can submit their own results at any time, and some of the test results will be graded and displayed in the public ranking list. For the contestants, through the rankings can be a good understanding of their own and opponents of the competition. It’s also interesting for the average contestant just to watch the top of the table change during the competition. < / P > < p > the picture above is based on the public leaderboard of the recent “instant graduation” competition. Each blue line represents a team and the orange line represents the best team score. < / P > < p > when you see the chart above, you may ask yourself why so many teams are doing so closely with each other. A large number of teams are focused on dark blue lines, which can be explained by the kaggle notebook. < / P > < p > have you never heard of a kaggle notebook before? Kaggle notebook is a cloud based platform for people in the community to share their code and ideas for their prediction model. It’s a great setup for beginners, because there’s no need to build a local environment, download data, install software packages, and get bogged down in version management. More importantly, these virtual machines tend to perform better than local laptops. Kaggle notebooks can be private or public. One interesting aspect of kaggle is that a real “game in the game” has its own rewards, while for a person’s public notebook, it is possible to be voted by community members. < / P > < p > in each competition, there are public notebooks created by community members to help explore the benchmark model for a given use case. Data scientists use these notebooks for community validation, to modify the work of others, and to step through other people’s code. < / P > < p > so, do all the kaggle scores converge on that dark blue line? That’s what happens when a breakthrough kernel is released and people across the competition adopt someone’s code or ideas. < / P > < p > if you don’t want to get into the kernel right away, you can turn your attention to the kaggle forum. That’s another good place to get started. Data scientists are here to share their ideas, ask questions, and talk. Every game has its unexpected moments: it may be a backward team leaping to the top of the table, or it may be a serious data leakage. < / P > < p > there are many times in a game where one or two teams stand out from the rest of the leaderboard. The guessing of community members about the efforts of top competitors to achieve their goals is called “magic of discovery”, and contestants are often competitors and observers. < / P > < p > in recent competitions, I’ve created some interesting notebooks that can track public leaderboards over time. In the champs molecular properties competition above, you can see that a team has found an example of pulling out of a crowd. < / P > < p > merging teams is an important and common strategy in the kaggle competition. It allows data scientists to collaborate in a secure environment and gain synergy in their prediction models. I usually start by setting personal goals and focusing on individual success. Once I have an idea for improving my model, I will consider extending it and working with others. < / P > < p > the kaggle team merger is no different from collaborative projects in other areas of professional life – it requires trust, ethical orientation and a cooperative mindset. On the other hand, solo is widely regarded as one of the most difficult things for kaggle in the game – in fact, to be a master of the game, you have to win a gold medal alone, a total of five gold medals. < / P > < p > in any given kaggle competition, the public leaderboard scores only part of the test set. It is possible that models created unintentionally perform very well in publicly rated test sets, but their model accuracy is not well reflected in private rankings. That’s why it’s important to have a model that’s not only accurate but also well extended to public and private leaderboards. < / P > < p > communities call this phenomenon “over fitting on public rankings.”. When the final results are tabulated, the fluctuations in the results can sometimes be very significant! The chart above shows the top 10 public and private teams in the last competition – you can see that only one team from the top 10 of the public remained in the final table. This example is actually relatively mild – in contrast to some well-known fluctuations in which some teams find themselves changing their rankings by hundreds, sometimes even thousands. < / P > < p > leaderboards, notebooks, forums, dramatic questions, teams, and the end result: if you decide to compete in the kaggle contest, you will encounter many problems that I can’t think of. I can’t think of a better way to improve our machine learning skills than joining kaggle. Continue ReadingASMC, a lithography maker, was one of TSMC’s 14 top suppliers last year