Not too long ago, most makers didn’t care about building email lists. Now, a lot very serious people say that you should start building your email list as soon as you start your website. Based on a casual Google search, you could make the case that machine learning and data services are starting to approach that level of ubiquity.

Today, we live at the confluence of two major societal trends.  The first is the Great Fragmentation, which is the phenomenon where everything gets cheaper and simpler. The second is the Great Acceleration, which is where the speed of techno-economic change increases exponentially. Further, machine learning is driving both of these movements.

For conversation sake, you can think of machine learning as a skill set for predicting customer interactions based on historical usage trends. This skill set includes characterizing the data, cleaning the data, and feature building. The first two skill sets are fairly straightforward. Characterizing the data is much like taking an inventory and cleaning the data is like making sure that all of the data in a column is of the same type.

By far, feature building is the sexiest of all the skills that machine learning has to offer. Feature building is what separates one data scientist from another. For example, a group of data scientists tasked with finding a solution to the same problem using the same data can come up with very different solutions because of feature engineering.

Now, this all may seem daunting because you are talking about data science. However, you got to keep in mind that we live at the confluence of two major societal trends, The Great Fragmentation, and the Great Acceleration. Everything in data science is getting cheaper because of Amazon and other cloud computing platforms. It’s getting simpler because of opensource software projects like docker and python.

What this says is that you don’t have to be a Ph.D. candidate to take advantage of the opportunities that machine learning has to offer. It does mean that there is a steep learning curve. However, you don’t have to be an expert to learn how to characterize the data you have available to you or to clean it so that you can be eventually manipulated using python based tools like pandas.

Where should you start?

The best place to start is with Kaggle.com. Kaggle is a website where you gain access to both the tools and the data you need to get started doing machine learning. Kaggle is structured using Jupyter Notebooks, which combine the tools and the data in one place. Notebooks are great because they allow you to manipulate the values and see what effect it has on the models.

How do you find data that resembles the type you anticipate collecting?

You can go to the ProgrammableWeb website. ProgrammableWeb has a huge directory of Application Programming Interfaces (API). You can think of API’s as pre-packaged data that you can use to learn how to manipulate data sets. You can find real data from real companies that you can use to build machine learning models.

Zachary Alexander