Shoot for the moon - machine learning for automated online ad detection
Why did a simple tree-based model outperform a complex graph neural network for detecting online ads? The answer is a lesson in practical machine learning.
#1about 4 minutes
The challenge of manual ad filtering and the moonshot project
Manual ad filter lists are slow and resource-intensive, prompting the "Project Moonshot" initiative to automate ad detection using AI and machine learning.
#2about 2 minutes
Choosing the right data source for ad detection
The team pivoted from inefficient computer vision models for perceptual ad detection to analyzing HTML structure, which provided richer data for machine learning.
#3about 3 minutes
Generating labeled training data at scale
A custom crawler combined with a modified Adblock Plus was used to automatically label HTML nodes on 250,000 web pages, creating a large-scale ground truth dataset.
#4about 4 minutes
Pre-processing HTML data and overcoming key challenges
The data pipeline converted raw HTML into adjacency and feature matrices while solving challenges like severely unbalanced data and slow processing speeds.
#5about 6 minutes
Experimenting with different machine learning model approaches
Several models were tested for ad classification, including graph neural networks, traditional classifiers with node embeddings, and tree-based models like XGBoost.
#6about 3 minutes
Comparing model performance and planning future improvements
Tree-based models significantly outperformed graph neural networks in F1 score, and future work will explore self-supervised learning and more diverse data.
#7about 3 minutes
Deploying machine learning models in a JavaScript environment
The team tackled deployment challenges by converting Python models to JavaScript, optimizing for latency by moving the model to a background script, and using TensorFlow.js.
#8about 5 minutes
Answering questions on model circumvention and design choices
The speakers address audience questions regarding how ad companies might circumvent the model and the rationale behind their model experimentation process.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
02:07 MIN
Creating a sustainable ecosystem for online content
Fireside Chat with Cloudflare's Chief Strategy Officer, Stephanie Cohen (with Mike Butcher MBE)
03:31 MIN
Previewing the "AI or knockout" conference talk
From Learning to Leading: Why HR Needs a ChatGPT License
05:37 MIN
The business and technical hurdles for AI search
ChatGPT vs Google: SEO in the Age of AI Search - Eric Enge
03:10 MIN
A rapid-fire look at AI tools and buzzwords
Rethinking Customer Experience in the Age of AI
04:06 MIN
The challenge of AI-generated content in DevRel
Exploring AI: Opportunities and Risks in Development
04:19 MIN
Practical governance and technical solutions for ethical AI
AI & Ethics
06:28 MIN
Real-world case studies of ML in environmental conservation
Optimizing your AI/ML workloads for sustainability
02:47 MIN
The challenge of operationalizing production machine learning systems
Model Governance and Explainable AI as tools for legal compliance and risk management
All the videos of Halfstack London 2024!Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
Daniel Cranney
Panel Discussion: Responsible AI in Practice - Real-World Examples and ChallengesIntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...