You've probably heard a lot about data science, artificial intelligence, and big data. Frankly, there has been a lot of hype surrounding these areas. What it has done is inflate expectations about what data science and data can actually accomplish. In general, this has been negative for the field of data science and for big data. It helps to give some thought of separating the hype of data science from the reality of data science. The first question is always "What is the question you are trying to answer with the data?" If someone comes to talk to you about a big data project, artificial intelligence or a data science project, and they start talking about the newest technology that they can use to do distributed computing, and analyze data with machine learning and they use a lot of buzzwords, the first question you should ask is "What is the question you are trying to answer with the data?" Because that really narrows the question and filters out a lot of hype around the tools and technologies that people are using, which can often be very interesting and fun to talk about. We like to talk about them too, but they aren't really going to add value to your organization on their own. The second question to ask yourself, once you have identified the question you are trying to answer with the data, is: "Do you have the data to actually answer that question?" So often the question you want to answer and the data you have to answer with are not really very compatible with each other. So you have to ask yourself "Can we get the data so that we can answer the question we want to answer?" Sometimes the answer is simply no, in which case you have to quit (for now). Bottom line, if you want to decide whether a project is hype or reality, you have to decide whether the data that people are trying to use is really relevant to the question they are trying to answer. The third thing to ask yourself is, "If you could answer the question with the data you have, could you even use the answer in a meaningful way?" This question goes back to that Netflix competition idea where there was a solution to the problem of predicting which videos people would like to watch. And it was a very, very good solution, but it was not a solution that could be implemented with the computing resources that Netflix had in a way that was financially convenient. Even though they could answer the question, even though they had the correct data, even though they were answering a specific question, they couldn't actually apply the results of what they found out. If you ask yourself these three questions, you can very quickly decipher whether a data science project is all about hype or if it is a real contribution that can really move your organization forward.
We’d love to hear from you. Let’s build your IT Dream Team!