Clurgo logo
  • Clurgo
  • Blog
  • Data Engineering Open Forum at Netflix. Was. A. Blast!

Data Engineering Open Forum at Netflix. Was. A. Blast!

4/24/2024 Paweł Mikler

Share

Last Thursday, April 18th Netflix Campus in Los Gatos, California turned into epicenter for experienced data practitioners that came from well-known organizations like: Tesla, Airbnb, OpenAI, LinkedIn Meta or Warner Bros. Discovery to connect with fellow professionals and to bridge the gap between technical depth and accessible discussions in the data engineering field.

The event kicked-off and was orchestrated by an incredible (and hardworking!): Xinran Waibel who introduced the idea behind the first ever Data Engineering Open Forum at Netflix and the importance for an ongoing dialogue within the data professionals community.

Xinran Waiber, welcoming attendees of the first ever Open Data Engineering Forum at Netflix

Xinran Waiber, welcoming attendees of the first ever Open Data Engineering Forum at Netflix

The event was very useful as it revealed some impressive set of tools like: Pensive (error classification service that leverages the rule-based classifier), Nightingale (service running the ML model trained using Metaflow and is responsible for generating a retry recommendation), Scheduler (service scheduling jobs, current implementation is with Netflix Maestro) or ConfigService by Netflix which use neural networks to refine Spark configs for more reliable pipeline retries. The recommended configurations are saved in ConfigService as a JSON patch with a scope defined to specify the jobs that can use the recommended configurations.

Binbing Hou, Senior Software Engineer at Netflix explaining the sequence of service calls with Auto Remediation

Binbing Hou, Senior Software Engineer at Netflix explaining the sequence of service calls with Auto Remediation

Jide O., from Context Data (Techstars 2024) caught everyone’s attention with their AI Agent building a Data Ontology model (logical and physical), served via a Knowledge Graph to LLMs. A Case Study of Manufacturing firm was a great example on how Jide’s solution can incorporate into complex domain such as supply chain operations, and:

  • deploy the Agent to integrate supply chain data
  • map data across majority of the client systems
  • create, integrate, end-to-end view of the supply chain in the ontological model
Jide Ogunjobi, Founder at Context Data explaining their AI Agent solution

Jide Ogunjobi, Founder at Context Data explaining their AI Agent solution

The presented solution is an interesting example showcasing the scale of complexity for an enterprise client when it comes to data integration for optimization analysis as well as how challenging is to incorporate such a solution in domain-specific context.
Netflix former Director, Data Science & Engineering, Jason Reid beautifully demonstrated the flexibility of Apache Iceberg. Tabular co-Founder explained the versatility of Iceberg and the reliability this open table format brings to big data, while making it possible for engines like Spark, Trino, Flink, Presto.

Jason Reid, Head of Product at Tabular at Open Data Engineering Forum at Netflix

Jason Reid, Head of Product at Tabular at Open Data Engineering Forum at Netflix

One of my favorite talks during the event was Clark Wright explaining AirBnB’s innovative method for assessing data warehouse quality, which has huge potential to set a new standard for data productivity in the industry. It was very interesting to hear about the gold standards for defining data quality dimensions at AirBnB:

  • Accuracy, Reliability, Stewardship and Usability Scores

and the perspective of using the score by two types of users: Data Producer (well-built data rises to the top of data consumer demand) and Data Consumer (data quality becomes something data consumers are demanding).

Clark Wright, Staff Analytics Engineer at AirBnB talking about innovative method to measure data warehouse quality at scale

Clark Wright, Staff Analytics Engineer at AirBnB talking about innovative method to measure data warehouse quality at scale

In order to better address the specific AirBnB customer needs and create data-sensitive product & services AirBnB data engineering approach to scale data quality is possible by:
allowing data consumers to have easy access to the best data for their use cases

  • surfacing the quality of data assets throughout the consumption workflow
  • expanding scope of scored assets
  • trying to automate identification of AirBnB highest-value assets
  • building data quality directly into and tightly coupled with paved-path tooling

The event at Netflix took just one full day and it was one of the most productive and well-spend days in my entire professional career. Long lines with so much accurate questions to the speakers, the beautiful Netflix Campus spoiled in a Californian sun, and enlightening sessions from the Gold Standard Data Engineering Community makes me wanna come back very soon!

I’m thankful for being able to share know-how and to introduce Clurgo to the #dataengineering family (I think I was the only attendee that came from Europe to attend this event).

Met such a wonderful people ( Karl Eden Stephanie Vezich Tamayo Jessica Larson Tulika Bhatt Martin Franco and so many more) as well as reconnected with my old-friend: Maciej Kaziród
A special thanks goes to: Xinran Waibel Rashmi Shamprasad Chris Colburn Jai Balani and Patricia Ho for the hard work of organizing such a remarkable event! You’re true hidden heroes!
Until next time,
Pawel

Clurgo logo

Subscribe to our newsletter

© 2014 - 2024 All rights reserved.