Data Scientist
Nielsen · Bengaluru, IN, India
About The Role
Role Overview
As a Hybrid Data Scientist you will sit at the intersection of high-scale data pipelining and advanced statistical methodology. You will be responsible for the end-to-end lifecycle of Incremental Reach and Audience Measurement products—from architecting Python-based data pipelines to implementing sophisticated Bayesian and Machine Learning models that quantify the lift of Digital media over a Linear TV baseline.
Key Responsibilities
- Advanced Statistical Modeling (The "Science" Side)
- Incremental Reach Frameworks: * Small-N Datasets: Implement Bayesian Model Averaging (BMA) to cycle through regression combinations, providing robust coefficients and credible intervals when study data is limited.
- Large-Scale Prediction: Deploy Gradient Boosted Regression Trees (GBM) to identify non-linear patterns and rank the impact of "Reach Drivers" (Media Weight, On-Target %, Frequency).
- Audience Deduplication: Use Maximum Entropy (MaxEnt) models to estimate unique audience reach across fragmented platforms by reconciling census and panel data.
- Additional Frameworks:
- Mixed-Effect Models: Use Hierarchical/Multilevel modeling to account for nested data (e.g., campaigns nested within specific industry verticals).
- Causal Lift: Apply Synthetic Control Methods to measure incremental shifts in behavior for campaigns with fixed timeframes where a clean control group is unavailable.
- Data Engineering & Pipeline Architecture (The "Engineering" Side)
- Python-Centric ETL: Architect and maintain robust data pipelines using Python (Pandas, PySpark) to ingest, clean, and harmonize data from Linear TV logs and Digital ad servers.
- Feature Engineering: Automate the extraction of Base Drivers (GRP, Reach Efficiency, Seasonality) and Custom Drivers (Share of Voice, Flighting) into a supervised learning-ready schema.
- Productionization: Wrap statistical models into production-grade APIs or scheduled containers (Docker/Airflow) to ensure repeatable and scalable measurement.
- Cloud Operations: Manage large-scale datasets within Cloud Data Warehouses (Snowflake, AWS, or GCP), optimizing SQL queries for high-performance analytics.
- Experimental Design & Methodology
- Control/Test Logistics: Design scientifically valid Control and Test groups, ensuring proper randomization or using Propensity Score Matching to mitigate selection bias.
- Variable Importance: Provide stakeholders with Posterior Inclusion Probabilities to identify which media levers (Duration, Weight, etc.) most consistently drive incremental reach.
- Cross-Media Calibration: Reconcile Linear TV's "One-to-Many" metrics with Digital's "One-to-One" tracking to provide a unified view of the consumer.
- Experience: 3-6 years of statistical model development and Mastery of Python (specifically for data manipulation and ML) and advanced SQL. Experience with PySpark or Dask for distributed computing is a plus.
- Statistical Mastery: Proven experience with GBM (XGBoost/LightGBM) and Bayesian Frameworks (e.g., PyMC, Stan, or R-BMA) among other Data Science models.
- Media Knowledge: Understanding of Linear TV vs. Digital dynamics, including Reach/Frequency, GRPs, and Deduplication logic.
- Education: Bachelor’s  or Master’s in a quantitative field (Statistics, Computer Science, Economics) or equivalent professional experience.
Please be aware that job-seekers may be at risk of targeting by scammers seeking personal data or money. Nielsen recruiters will only contact you through official job boards, LinkedIn, or email with a  nielsen.com  domain. Be cautious of any outreach claiming to be from Nielsen via other messaging platforms or personal email addresses. Always verify that email communications come from an @ nielsen.com  address. If you're unsure about the authenticity of a job offer or communication, please contact Nielsen directly through our official website or verified social media channels.
This listing was posted by a verified recruiter at Nielsen. Report this listing
JobSpring