Recommended Data Science Content Sources
You are what you eat, and it's your job as a knowledge worker to be on the lookout for a good information diet. In this post, I want to share the sources of information regarding data science, AI, and the tech surrounding it, which I found most useful or appealing. I hope it helps you as well!
Of course, this is all Just My Opinion™. If you think I should change something, feel free to yell at me @Guy_T_Sky :)
In no particular order:
Two Minute Papers
Good for staying up to date, updated frequently.
The host, Karoly, has an infectious enthusiasm and positivity for all the topics he covers.
Expect coverage of interesting papers not just about AI, but also about computer graphics and other visually stunning topics.
Yannic Kilcher
Yannic explains prominent deep learning papers in a thorough, technical way. Rather than reading the paper yourself, it's often faster and easier to watch one of his videos in order to understand important papers in-depth. The explanations capture the punchline of the papers in a deep way, without handwaving away the math or getting lost in the weeds.
Yannic also shares his more subtle perspectives - how papers relate to each other, interpretations of the wider meaning and how seriously to take the results, etc. These insights are harder for newcomers (or non-academic practitioners) to arrive at by themselves.
Distill.pub
In their own words:
Machine Learning Research Should Be Clear, Dynamic and Vivid.
Distill Is Here to Help.
Distill is a unique publication for machine learning research. It promotes articles which use stunning visualizations to give the reader a more intuitive understanding of the topics. Spatial reasoning and imagination tends to work very well to understand topics in machine learning & data science. This is not surprising, considering so much of the fundamental math behind the fields is linear algebra and calculus. In contrast, traditional publication formats tend to be rigid in their structure, static, dry, and sometimes unnecessarily "mathy".
Chris Olah, one of the creators of Distill, also has an amazing personal blog: https://colah.github.io/. It hasn't been updated in a while, but it's still a collection of some of the most well-written explanations on Deep Learning ever written. The explanation of LSTMs in particular was a great help to me!
Sebastian Ruder
Sebastian Ruder writes a super high-quality blog and newsletter, primarily about the intersection of neural networks and NLP. He also has lots of advice for researchers, and reports on academic conferences, which could be very useful if you're in academia.
His articles tend to take the form of surveys - summing up and explaining the state of the art research and techniques in an area, which means it's extremely useful for practitioners who want to orient themselves fast.
Andrej Karpathy
Andrej Karpathy needs no introduction! Besides being one of the most known deep learning researchers on earth, he is a font of creativity, creating widely used tools like arxiv sanity preserver as side projects.
Countless people have entered the field via his Stanford cs231n course, and you would benefit from committing his neural network training recipe to heart.
I also recommend watching his talk about the real-life problems Tesla needs to overcome when trying to apply machine learning at massive scale in the real world.It's impressive, informative and sobering.
Besides writing about ML directly, he also writes some good life advice for aspiring researchers.
Uber Engineering
Uber's engineering blog is truly impressive in scope and breadth, covering a ton of topics, AI in particular.
What I particularly like about Uber's engineering culture, is their tendency to spin off super interesting and valuable open source projects at a head-spinning pace. Some examples are:
- https://github.com/uber/ludwig
- https://github.com/uber/h3
- https://github.com/uber/react-vis
- https://github.com/uber/aresdb
- And the list goes on and on and on... Hats off, Uber 🎩
OpenAI Blog
Putting aside any controversy, the OpenAI blog is undeniably beautiful. From time to time, it posts content about deep learning insights which only OpenAI's massive scale can reach, such as the hypothesized Deep Double Descent phenomenon. They tend to post infrequent, high-impact pieces, so it's a high noise-to-signal ratio.
Taboola Blog
Not as well known as some of the other suggestions in this post, I find the Taboola blog to be unique - it deals with very down-to-earth, real-life problems when trying to use ML in production for "normal" businesses - less self driving cars and RL agents beating world champions, more "how do I know if my model is now predicting things with fake confidence?"
These problems are relevant for almost everyone working in the field, and they get less press coverage than the more sexy AI topics, but solving them correctly still requires world class talent. Thankfully, Taboola has both that talent AND the willingness and ability to write about it, so that others can learn as well.
Alongside Twitter, there's nothing quite like Reddit to get caught up on papers, tools, and the wisdom of the crowds.
State of AI
Published only annually, but contains very dense information content.
Relative to the other sources in this list, it's more accessible to (non-technical) business people.
What I like about the report, is it tries to give a more holistic view of where the industry and research is going at 10,000 feet - tying together advances in hardware, research, business, and even geopolitics.
Be sure to start by skipping to the end to read about the conflicts of interests :)
Podcasts
To be frank, in my opinion podcasts are problematic for learning about technical topics. Most of them have a hard time explaining the things that need explaining using audio only, as data science is a very visual field. Podcasts tend to succeed only in giving you leads for deeper investigation later, or in fun philosophical discussions.
Nevertheless, here are some recommendations:
- Lex Fridman's podcast, when he has on prominent researchers from the field of AI. The Francois Chollet episodes are particularly good!
- Data Engineering podcast. Good for hearing about new data infrastructure tools in your audio-only time (though COVID has cut down on that time...).
Awesome Lists
Less content sources to follow, more useful resource when you know what you're looking for:
- Matty Mariansky
Matty finds beautiful & creative ways to use neural networks, and it's just fun to see his results in your Twitter feed. - Ori Cohen
Ori is a blog-writing machine (not literally... I think). He writes profusely about problems and solutions for data scientists in the trenches. Be sure to follow him to get notified when he does.
His compendium in particular is really impressive. - Jeremy Howard
Co-founder of fast.ai, notebook crusader, and all-round amazing font of creativity and productivity. - Hamel Husain
Staff ML engineer @Github, Hamel is busy at work making and reporting on many tools for data coders. - Francois Chollet
Creator of Keras, currently pushing to update our notions of what intelligence is and how to test for it. - Gwern
Probably the most colorful persona in this list. They certainly aren't boring! - Hardmaru
No one can train slime balls to play volleyball like Hardmaru can.
Machine & Deep Learning Israel
I may be a bit biased, but I feel that Israel (my home-country) has an amazingly vibrant and professional data science community and ecosystem, with top-tier talent and a very practical, no-nonsense approach.
This Facebook group is always buzzing about the latest trends, events, and open source projects, as well as deep discussions about more timeless subjects like career planning. Top-tier professionals in the field weigh in frequently, and the result is an amazing resource for anyone who wants to learn about the field.
Unfortunately, this resource is only relevant if you happen to be a Hebrew speaker. Sorry, 99.9% of the population of Earth ¯\_(ツ)_/¯
Conclusion
This blog post may be updated as I find more wonderful sources of content, which would be a shame not to include in the list. Feel free to contact me @Guy_T_Sky if you want to recommend some new source!
DAGsHub is hiring a Data Science Advocate, so if you are creating your own data science content, we think we have a very fun offer for you!