Muhammad Danial Khilji

Data Scientist | Machine Learning Engineer

Google Certified Generative AI Leader

RSS Certified Professional Data Scientist and Data Analyst

About

I’m a Data Scientist at WPP, specializing in knowledge representation, graph-based systems, and applied machine learning. I design and productionize semantic systems for audience and identity mapping, using embeddings, probabilistic matching, and graph-based approaches. My work on the Audience Translator platform has enabled 10K+ users to map taxonomies to platforms like Meta and Google. I build end-to-end solutions combining LLMs, retrieval (RAG), AI-Agents, and structured data, including taxonomy enrichment, automated validation workflows, and scalable Python pipelines. More recently, I’ve focused on AI agent systems, developing a LangGraph-based multi-agent framework to automate ad-tech platform research, extracting API insights, audience taxonomies, and reach estimates. I’m particularly interested in building intelligent systems at the intersection of structured knowledge, retrieval, and AI agents.

Skills

Domains & Expertise

Machine LearningData ScienceData AnalysisComputer VisionNatural Language Processing (NLP)Language Models (LMs)Knowledge GraphsSemantic WebRetrieval Augmented Generation (RAG)AI AgentsEntity ResolutionAPI DevelopmentMLOpsModel Context Protocol (MCP)SQLData EngineeringData PipelinesData Clean RoomsData VisualizationWeb Development

Technologies & Tools

Python
MySQL
PostgreSQL
R
C++
LangChainLangChain
LangGraphLangGraph
Neo4j
OpenAI API
Anthropic API
Hugging FaceHugging Face
Docker
FastAPI
Pandas
NumPy
Scikit-learn
TensorFlow
PyTorch
OpenCVOpenCV
Jupyter
Apache Airflow
Power BIPower BI
ExcelExcel
Google AnalyticsGoogle Analytics
PySpark
Flask
Git
VS Code
Linux
BigQuery MLBigQuery ML
SnowflakeSnowflake
Azure AI
Google Vertex AI
Google Gemini EnterpriseGoogle Gemini Enterprise
TableauTableau
R Studio
Kubeflow
Arduino
Matlab
Matlab Simulink
Google StudioGoogle Studio
FlourishFlourish
WekaWeka
Astro
TypeScript
Tailwind CSS
HTML/CSS
JavaScript
GitHub Actions
GitHub Pages

Publications

Features matching using natural language processing

International Journal on Cybernetics & Informatics (IJCI) - Mar 24, 2023

The feature matching is a basic step in matching different datasets. This article shows a new hybrid model of a pretrained Natural Language Processing (NLP) based model called BERT used in parallel with a statistical model based on Jaccard similarity to measure the similarity between list of features from two different datasets. This reduces the time required to search for correlations or manually match each feature from one dataset to another.

Analysis Photovoltaic System in Relation to Tracking and Non-Tracking System

Journal of Fundamentals of Renewable Energy and Applications - Feb 15, 2021

The increasing demand of electricity has been a great concern in recent years. The increasing demand and environmental (global warming) issues urged scientists to evolve in the field of renewable energy. Solar energy is one of the major sources of renewable energy. Electrical energy is produced by photovoltaic cells when they allow light particles to knock free electrons from atoms. The amount of electrical output produced by the system is dependent on amount of solar energy received by PV cells. To increase solar energy output, a fixed solar panel inclined towards the optimal point is usually used. The collection of solar energy is increased by using solar tracking systems i.e. single axis or dual axis, which continuously track the sun using incidence angle of sunlight. The analysis is carried out to compare the performance between tracking and non-tracking photovoltaic systems. Data of specific solar panel systems is analysed and compared with simulations and actual outputs to compute performance ratios and deduce conclusions. The average performance ratio is found out to be 0.73 for non-tracking system and 0.90 (17% more than non-tracking systems) for tracking systems. The accuracy of estimated output of a PV system can be improved by using more accurate solar irradiance data, accurate weather conditions, exact system losses and matched inverter efficiency. The efficiency of a PV system can be improved by using solar trackers, using more efficient solar panels, installing them in a less shaded area, cleaning the panels on regular intervals, and using more efficient electrical components.

Languages

🇬🇧 English🇵🇰 Urdu🇩🇪 German🇵🇰 Punjabi