Iheartmedia.jobs

(USA-NY-New York) Data Engineer

2016-07-02

Data Engineer at iHeartRadio

32 Avenue of the Americas, New York, NY 10013

iHeartRadio, iHeartMedia’s free digital radio service, is the No\. 1 all\-in\-one digital audio service with over 600 million downloads; it reached its first 20 million registered users faster than any digital service in Internet history and reached 70 million users faster than any digital music service and even faster than Twitter, Facebook and Pinterest\. The company’s operations include radio broadcasting, online, mobile, digital and social media, live concerts and events, syndication, music research services and independent media representation\.

**Position Overview:**

iHeartMedia has data problems, both big and small\. This means managing extensive listening history from over 70 million users, music metadata on over 15 million tracks, data processing for 850\+ broadcast stations, ad sales/impressions, and targeted advertising campaigns\. The Data Engineer who joins our team will impact how people listen to the radio, how the marketing and advertising industry can connect with our millions of listeners, and help drive important business decisions with a robust data platform\.

Many of our big data problems have lead us to build solutions on Redshift, Hadoop and Hive using frameworks like Luigi and Celery\. As part of the data platform team, you will refine existing processes, import external data sources and create data mashups to provide valuable insights\. While the team is growing it is still small and everyone involved has the opportunity to make a lasting impact and help transform the radio industry by leveraging data insights to bring our digital products to life\.

**Responsibilities:**

+ Work in an Agile development methodology and own data driven solutions end\-to\-end
+ Identify performance bottlenecks in data pipelines and architect quicker more efficient solutions when necessary\. This may involve reaching out to internal teams and external partners to ensure the appropriate optimization standards are being followed\.
+ Create new data warehouse solutions and ensure best practices are following in schema and table design
+ Develop end\-to\-end ETL processes in Python to send large data sets to a Hadoop cluster and bring summarized results back into a Redshift data warehouse for downstream business analysis\. The data sources can include Kakfa, flat files and REST API’s\.
+ When needed, perform data housekeeping, data cleansing, normalization, hashing, and implementation of required data model changes\.
+ Increase efficiency and automate processes by collaborating with the data platform team to update existing data infrastructure \(data model, hardware, cloud services, etc\.\)

**Requirements:**

+ Ability to write well\-abstracted, reusable code components in Python
+ A self\-starter who can find data anomalies and and fix issues without direction\.
+ Willing and interested to work in new areas and across multiple programming paradigms such as Kafka, RabbitMQ, Redshift, Hadoop, Linux, etc
+ Ability to investigate data issues across a large and complex system by working alongside multiple departments and systems\.

**Nice\-to\-Have’s:**

+ Experience with Hadoop & Hive as well as Python's Luigi ETL framework are a huge plus
+ Exposure to Amazon Web Services, especially S3, EC2, Redshift and EMR
+ Knowledge of development environment and provisioning tools such as Vagrant, Chef and Docker
+ Understanding of modern version control tools, such as Git, as well as a Continuous Integration tools such as Travis or Jenkins