Domain-driven data science: The secret to accurate insights
Data science is a rapidly growing field that has the potential to transform industries and improve decision-making. However, to be successful in this field, data scientists must have a good understanding of the domain they are working in. In this article, we will discuss the value of having good knowledge of the domain as a data scientist and the lessons I learned from collaborating with a digital marketing agency as a data scientist.
Let’s start with the why.
Why do we need deep understanding of the domain as data scientists?
Deep understanding…
Allows data scientists to identify the most relevant data sources. Every industry has its unique characteristics and data sources that are most useful for solving problems. A data scientist who understands the domain can identify and collect relevant data more efficiently, saving time and resources. Without a proper understanding of the domain, data scientists may struggle to identify the most important data sources or collect irrelevant data, leading to inaccurate or incomplete insights.
Allows data scientists to pre-process the data correctly. Data preprocessing is a crucial step in any data science project, and understanding the domain helps data scientists to handle missing values, outliers, and other data quality issues more effectively. It also enables them to apply feature engineering techniques that are relevant to the industry and problem at hand, improving the accuracy and relevance of the insights generated.
Enables data scientists to select the right modeling techniques. Different industries have different characteristics, and the appropriate modeling techniques may vary depending on the industry. Understanding the domain helps data scientists select the right modeling techniques and algorithms that are most appropriate for the industry and problem at hand. This ensures that the insights generated are accurate and relevant to the problem at hand, improving decision-making and driving better outcomes.
Enables data scientists to interpret the results correctly. Data science is not just about generating insights but also about interpreting them correctly. Understanding the domain allows data scientists to interpret the results in the context of the industry and problem at hand, ensuring that the insights generated are actionable and relevant.
How I did it — a bit of background
First weeks after moved into Toronto. I am going through the process of job hunting, studying, researching. My sister who works in the city in analytics had just become an analytics manager in a new company and her subordinates were all gone on vacation. Therefore, brother / data scientist for the rescue. The only problem? The domain was digital advertising for big companies the likes of Coca Cola and Apple.. I new nothing about digital advertising though — if I wanted to help with the data science projects, I had to also bring my A game in the domain.
I had a weekend to prepare for Monday so I had to make a plan that will give me the maximum return for my effort. My goal is to absorb all necessary knowledge available. I like reading technical books — what’s common in those books? There is a plan to attack the knowledge available, you start with an intro, the foundations, the scope, and then you get bombarded with technical knowledge. That is exactly how I approached the new domain. As my own book. Only thing left now is to make the table of contents and start researching.
By the end, my book would look like this:
Saturday:
- Overview of digital media and its impact on society
- Types of digital media, including websites, social media, and mobile applications
- Digital marketing and advertising, including search engine optimization (SEO), pay-per-click (PPC) advertising, and social media advertising
- Content creation and management, including website design, content strategy, and content management systems (CMS)
Sunday:
- Data analytics, including website analytics and social media analytics
- Digital media ethics, including issues related to privacy, freedom of expression, and responsible journalism
- Emerging trends and technologies in digital media, such as artificial intelligence, virtual and augmented reality, and blockchain
- Practical skills, such as using CMS platforms, basic HTML and CSS coding, and image editing software
Like I said, only a weekend available, maximize returns equals get enough knowledge to survive.
However, to be efficient, I had to incorporate one more “new” secret power. My assistant was.. you guessed it, ChatGPT. My newly found — more efficient google was all in for that mission. Once I had basic knowledge and facts, I would throw prompts at ChatGPT to elaborate more on each fact. When you ground that system with specific questions or you make it elaborate on specific tasks, it’s amazingly helpful.
How I used that?
I asked the most obvious question — What does it take to have a deep understanding of digital advertising? Why? Because my new domain would be exactly that, data science for Digital Advertising.
The answer?
A deep understanding of digital advertising refers to a comprehensive knowledge and understanding of the various aspects of digital advertising, including:
- Digital advertising formats: An understanding of the different digital advertising formats, including display ads, search ads, social media ads, video ads, and others.
- Targeting techniques: Knowledge of the various targeting techniques used in digital advertising, such as demographic targeting, behavioural targeting, contextual targeting, and others.
- Ad networks and platforms: An understanding of the different ad networks and platforms used to distribute and display digital ads, including Google Ads, Facebook Ads, Twitter Ads, and others.
- Metrics and measurement: Knowledge of the key metrics used to measure the performance of digital ads, including impressions, clicks, conversions, cost per action (CPA), return on ad spend (ROAS), and others.
- Campaign optimization: Understanding of best practices for optimizing digital advertising campaigns, including adjusting targeting, creative elements, and budget allocation.
- Industry trends and technology: Familiarity with the latest industry trends and technologies related to digital advertising, including emerging formats and platforms, new targeting techniques, and advances in measurement and optimization.
- Legal and ethical considerations: Awareness of the legal and ethical considerations involved in digital advertising, including privacy laws, ad fraud, and deceptive advertising practices.
Okay, so now I’m asking again, what do you mean by point number 1, and what about point number 2, and so on and so forth..
In the next hour or so, I was bombarded with answers and each answer was a path to the right direction.
Best practices for optimizing digital advertising campaigns, targeting techniques used in digital advertising, digital advertising formats, key components of the digital media space, and so much more. For every bullet of the first question I had ten more bullets generated, and then again, from those bullets, some more. In a weekend I managed to gain so much knowledge and I did not waste any time trying to find answers. I had all this time just for studying, evaluating and figuring out the domain.
Is that all?
Not exactly.. Having domain knowledge is the first step of building a good foundation in understanding all the different aspects. However, I needed some practical knowledge as well to embark in this data science trip. What did I do?
Case studies.
Again, ChatGPT for the rescue, feeding me all the amazing success stories of case studies in digital marketing, giving me references for datasets and ideas for digital advertising projects.
Projects, projects, projects!
If I had more time to upskill, I would go about that route, however the weekend ended and Monday would take me into that journey by itself. If that wasn’t the case, I would continue this dip into the new domain with implementing projects on that domain.
Approach any problem in a new domain
That was only one example, however when approaching any problem in a new domain or industry, I would always follow the list and make sure I tick every box.
- Understand the domain: The first step is to gain a deep understanding of the industry in question. This includes understanding the business goals and objectives, as well as any unique characteristics or challenges of the industry.
- Identify the problem: Once you understand the industry, you need to identify the specific problem you want to solve. This may involve talking to domain experts or conducting research to identify key pain points.
- Collect and pre-process data: After identifying the problem, the next step is to collect and pre-process relevant data. This may involve identifying the sources of data, cleaning and preprocessing the data, and ensuring that the data is of high quality.
- Explore the data: Once you have the data, the next step is to explore the data and gain insights. This may involve visualizing the data, identifying patterns and trends, and using statistical analysis to identify correlations and relationships.
- Build and test models: Based on your understanding of the problem and insights gained from the data, you can start building and testing models. This may involve using machine learning algorithms, statistical models, or other techniques to generate predictions or recommendations.
- Evaluate and refine: After building the model, you need to evaluate its performance and refine it as necessary. This may involve tweaking the model parameters, testing the model on new data, or using different techniques to improve the model’s performance.
- Communicate results: Finally, you need to communicate the results of your analysis to stakeholders in the industry. This may involve creating visualizations or dashboards to communicate insights, presenting your findings to key decision-makers, or writing reports summarizing your analysis.
In summary, as a data scientist tackling problems in a new industry we should all take a systematic approach that involves understanding the domain, identifying the problem, collecting and preprocessing data, exploring the data, building and testing models, evaluating and refining the models, and communicating the results.
When learning data science, please do not spend all your time with trivial problems like the Titanic or solely study and use the new cool deep learning framework. Find a domain that excites you, make it personal, ask questions, and then do data science.
I hope my story inspires you to take your next move into the domain you are passionate about!
Thank you!