So a friend recently asked me to teach her data analytics and market segmentation for 2 very high-paying job interviews coming up.
“Data analytics refers to qualitative and quantitative techniques and processes used to enhance productivity and business gain. Data is extracted and categorized to identify and analyze behavioural data and patterns, and techniques vary according to organizational requirements.” – Technopedia.
Now, the thing about Data Analytics is that it’s really an approach/technique. You can’t teach it unless it’s specific to a particular case. Nonetheless, it’s a common question so I thought why not put it into an article.
Data Analytics Requires Data
This is the lifeblood of analytics. You need to be studying a dataset. Before you go “LIKE, DUH!?”… it’d also be surprised to know that’s not the starting point of analytics.
It’s common for organizations to throw you thousands of data sets and ask you to discover something. Sadly, if that’s all you do, you’re not going to maximise your use of data at all. It’s one of the reasons why so many MNCs and governments have BIG DATA but never actually get anything outta it (reason #2 being an inability to act on data).
A good data scientist will always plan for what needs to be known and design the systems to collect data in a way that gives those insights. Massaging raw, unstructured data is the toughest task in the world, because you’re trying to piece random information together to discover that eureka moment. Of course, many situations are unplanned and that’s often all you have to work with. But if you have ANY control over the project at all, PLAN THE DATA YOU WANT TO COLLECT.
Learn From Big Brother Google
Digital/online campaigns are the easiest ones to use data analytics on. The very nature of the internet and websites creates millions of data points we can capture and analyze. Let’s use the simple case of studying whether or not visitors like your website.
Sounds like an easy task. We install Google Analytics and everything’s there for us to see. Pages per visit, Time spent on the website, New VS returning visitors, bounce rate, the content they’re reading, navigation flow, etc.
However, did you ever stop to think that if Google Analytics didn’t exist, you’d have to come up with all those metrics yourself? You will have to decide what actually defines “visitors liking a website” and design your system to capture those specific points!
Not so easy now, is it?
Google made the job so easy for webmasters, they rarely stopped to think about how the analytics system was designed. When was the last time you questioned why Google selected those metrics?
To explain any further, I’d need to start using specific examples. Let’s explore the case of the Singapore government’s initiative for us to return our own trays at hawker centres/food courts. It’s a good initiative and reduces Singapore’s reliance on low-cost workers.
So in an imaginary world, where you get placed in charge by the Singapore government, how would you use data analytics to help drive this?
Step 1: Knowing what to track
Before you start executing all your crazy ideas, let’s take a step back.
We know Singaporeans don’t return our trays. That’s why there are so many empty trays and plates lying around and cleaners scramble to keep the tables clean.
How big is the issue? Do 9/10 people fail to return their trays? Do 10 trays go missing for every 100 carried off? How do you even improve the problem if you don’t know exactly how severe or minor it is? Everything we think we “know” is simply based on what our eyes see. You, the project manager, has ZERO numbers. That’s NOT good.
Step 1 is to start thinking about what the important data points are.
- Number of trays left unattended per 100?
- Avg time taken before trays are cleared?
- Number of cleaners/trays ratio?
- Uncollected trays plotted against time on a graph to spot peak problem periods?
Deciding on these requires substantial knowledge of the industry as well as a clear problem definition. It is a collaborative effort between your entire team and requires your clients’/superiors’ approvals. Ultimately, this is a project, and the project succeeds only if the client/superior feels it has succeeded.
The good thing about agreeing on data points early on is that clueless superiors can’t rate you based on their emotions/experience at the end of the project. Data is data. Numbers don’t lie.
Step 2: Collecting the data
After you’ve decided upon how “success” will look like, it’s time to start looking at solutions that will actually track those metrics. Google Analytics makes it easy for us to do this for website user behaviour. This time, you’re on your own.
There’s actually a similar initiative by the government to reduce manpower for toilet cleaning. One of the methods used to track key data points was to install a “people counter” at the toilet entrance. People counters are usually cameras with the ability to recognize a human subject and automatically count people entering a specific location.
Stage 1 involves just counting the number of people. Cleaners still went in to do their routine cleaning.
However, when a large enough dataset is collected, patterns will emerge. The system can then predict when a toilet needs cleaning, based on how many people have used it. This means that cleaners no longer have to waste manpower cleaning toilets with no visitors, and manpower can also be allocated when there is a spike in usage (perhaps due to an event nearby).
The solutions you pick to collect the data are very much determined by the type of data you decided to collect. Until you know exactly what metrics you need, it’d be impossible to design the data collection system, and any data sets collected (even BIG data) becomes rather wasted.
Step 3: Define existing benchmarks
The next step is actually pretty easy. Once you’ve started tracking, take a snapshot of the current situation. Collect enough data and patterns might even emerge. You might realise a deeper problem or a more straightforward solution that will surprise you.
In any case, the snapshot will now act as your benchmark. Your job now is to implement solutions to improve these metrics.
Step 4: Data analytics
And at long last, we come to “data analytics”. The very thing people ask me to teach them about. I hope it’s clear by now that it isn’t a skill in itself, but actually part of a very logical set of processes.
People often have the impression that data analytics is this really geeky subject and get scared off. But if you’ve structured everything right, this is actually the easiest part. Of course, when data sets get big enough, you always have the choice to dig deeper and find correlations, build models on it, optimize based on data etc. THAT will be the geeky stuff but your team of math geniuses and engineers will help you out.
Just remember, the least you can do to help your data scientists out is to help them design a system that collects meaningful data. Not throw millions of data points at them and expect them to perform miracles.