[Webinar] Kafka Re-Architected for the Cloud with Kora Engine → Register Today!

Kafka Summit New York 2019

View sessions and slides from Kafka Summit New York 2019

Keynotes

Keynote: Events Everywhere

  • Jay Kreps, Confluent

Keynote: Spring Boot+Kafka: The New Enterprise Platform

  • James Watters, Pivotal

Core Kafka

Lessons Learned Building a Connector Using Kafka Connect

  • Katherine Stanley, IBM United Kingdom Ltd.
  • Andrew Schofield, IBM

While many companies are embracing Apache Kafka as their core event streaming platform they may still have events they want to unlock in other systems. Kafka Connect provides a common API for developers to do just that and the number of open-source connectors available is growing rapidly. The IBM MQ sink and source connectors allow you to flow messages between your Apache Kafka cluster and your IBM MQ queues. In this session I will share our lessons learned and top tips for building a Kafka Connect connector. I’ll explain how a connector is structured, how the framework calls it and some of the things to consider when providing configuration options. The more Kafka Connect connectors the community creates the better, as it will enable everyone to unlock the events in their existing systems.

Troubleshooting as Your Kafka Clusters Grow

  • Krunal Vora, Tinder

The graphical curve of ‘getting things done with Kafka’ vs. ‘time spent with Kafka’ rises pretty quickly before starting to plateau. Kafka, as we know, is great to get started out of the box. Eventually though, anyone who has used Kafka for a reasonable amount of time would come across problems, both trivial and hardcore. More often than not, these problems feel tedious and repetitive which could be addressed better by the Kafka community. Issues like skewed partitions and brokers, partition reassignment for a large number of topics, working with messed up consumer offsets, distributing workload evenly to partitions, monitor and view the Kafka cluster better, etc. sound familiar among the community. Come, learn from our experiences trying to overcome and automate some of the most common problems surrounding the scaling Kafka clusters at Tinder. Let’s create an environment of sharing the best practices for usage of tools around Kafka.

No More Silos: Integrating Databases into Apache Kafka®

  • Robin Moffatt, Confluent

Companies new and old are all recognising the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low latency analytics, event-driven architectures and the population of multiple downstream systems. In this talk, we’ll look at one of the most common integration requirements – connecting databases to Kafka. We’ll consider the concept that all data is a stream of events, including that residing within a database. We’ll look at why we’d want to stream data from a database, including driving applications in Kafka from events upstream. We’ll discuss the different methods for connecting databases to Kafka, and the pros and cons of each. Techniques including Change-Data-Capture (CDC) and Kafka Connect will be covered, as well as an exploration of the power of KSQL for performing transformations such as joins on the inbound data. Attendees of this talk will learn: * That all data is event streams; databases are just a materialised view of a stream of events. * The best ways to integrate databases with Kafka. * Anti-patterns of which to be aware. * The power of KSQL for transforming streams of data in Kafka.

Exactly Once Semantics Revisited

  • Jason Gustafson, Confluent

Two years ago, we helped to contribute a framework for exactly once semantics (or EOS) to Apache Kafka. This much-needed feature brought transactional guarantees to stream processing engines such as Kafka Streams. In this talk, we will recount the journey since then and the lessons we have learned as usage has gradually picked up steam. What did we get right and what did we get wrong? Most importantly, we will discuss how the work is continuing to evolve in order to provide more reliability and better performance. This talk assumes basic familiarity with Kafka and the log abstraction. What you will get out of it is a deeper understanding of the underlying architecture of the EOS framework in Kafka, what its limitations are, and how you can use it to solve problems.

Flexible Authentication Strategies with SASL/OAUTHBEARER

  • Michael Kaminski, The New York Times
  • Ron Dagostino, State Street Corp

In order to maximize Kafka accessibility within an organization, Kafka operators must choose an authentication option that balances security with ease of use. Kafka has been historically limited to a small number of authentication options that are difficult to integrate with a Single Signon (SSO) strategy, such as mutual TLS, basic auth, and Kerberos. The arrival of SASL/OAUTHBEARER in Kafka 2.0.0 affords system operators a flexible framework for integrating Kafka with their existing authentication infrastructure. Ron Dagostino (State Street Corporation) and Mike Kaminski (The New York Times) team up to discuss SASL/OAUTHBEARER and it’s real-world applications. Ron, who contributed the feature to core Kafka, explains the origins and intricacies of its development along with additional, related security changes, including client re-authentication (merged and scheduled for release in v2.2.0) and the plans for support of SASL/OAUTHBEARER in librdkafka-based clients. Mike Kaminski, a developer on The Publishing Pipeline team at The New York Times, talks about how his team leverages SASL/OAUTHBEARER to break down silos between teams by making it easy for product owners to get connected to the Publishing Pipeline’s Kafka cluster.

Kafka on Kubernetes: Does it really have to be "The Hard Way"?

  • Viktor Gamov, Confluent
  • Michael Ng, Confluent

When it comes to choosing a distributed streaming platform for real-time data pipelines, everyone knows the answer: Apache Kafka! And when it comes to deploying applications at scale without needing to integrate different pieces of infrastructure yourself, the answer nowadays is increasingly Kubernetes. However, with all great things, the devil is truly in the details. While Kubernetes does provide all the building blocks that are needed, a lot of thought is required to truly create an enterprise-grade Kafka platform that can be used in production. In this technical deep dive, Michael and Viktor will go through challenges and pitfalls of managing Kafka on Kubernetes as well as the goals and lessons learned from the development of the Confluent Operator for Kubernetes. NOTE: This talk will be delivered with Michael Ng, product manager, Confluent

Bulletproof Apache Kafka® with Fault Tree Analysis

  • Andrey Falko, Lyft

We recently learned about “Fault Tree Analysis” and decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient. This talk should provide a framework for answers the following common questions a Kafka operator or user might have:

-What guarantees can I promise my users?
-What should my replication factor?
-What should the ISR setting be?
-Should I use RAID or not?
-Should I use external storage such as EBS or local disks?

Experiences Operating Apache Kafka® at Scale

  • Noa Resare, Apple

Running Apache Kafka sometimes presents interesting challenges, especially when operating at scale. In this talk we share some of our experiences operating Apache Kafka as a service across a large company. What happens when you create a lot of partitions and then need to restart brokers? What if you find yourself with a need to reassign almost all partitions in all of your clusters? How do you track progress on large-scale reassignments? How do you make sure that moving data between nodes in a cluster does not impact producers and consumers connected to the cluster? We invite you to dive into a few of the issues we have encountered and share debugging and mitigation strategies.

Event-Driven Development

Complex Event Flows in Distributed Systems

  • Bernd Ruecker, Camunda

Event-driven architectures enable nicely decoupled microservices and are fundamental for decentral data management. However, using peer-to-peer event chains to implement complex end-to-end logic crossing service boundaries can accidentally increase coupling. Extracting such business logic into dedicated services reduces coupling and allows to keep sight of larger-scale flows – without violating bounded contexts, harming service autonomy or introducing god services. Service boundaries get clearer and service APIs get smarter by focusing on their potentially long running nature.

I will demonstrate how the new generation of lightweight and highly-scalable state machines ease the implementation of long running services. Based on my real-life experiences, I will share how to handle complex logic and flows which require proper reactions on failures, timeouts and compensating actions and provide guidance backed by code examples to illustrate alternative approaches.

Event Driven Architecture with a RESTful Microservices Architecture

  • Kyle Bendickson, Tinder

Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.

Things we’ll discuss decoupling your system at scale via event-driven architectures include:

– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.

– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern

– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)

– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.

– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems

– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.

– The simplicity and power of streaming SQL with microservices

The Migration to Event-Driven Microservices

  • Adam Bellemare, Confluent

Flipp is an e-commerce company that promotes weekly shopping opportunities. We began our migration to event-driven microservices in November 2016, and have since moved to nearly 300 Kafka-powered microservices. In this presentation, we will explore the major strategies we have used in our migration from distributed monoliths to event-driven microservices. There have been a number of painful learnings and pitfalls along the way that we will share with you. Lastly, we will provide recommendations for each step of the way on your journey from monoliths to effective event-driven microservices. The first major section of this presentation deals with the liberation of data from monolithic services. In this section, we will cover: Kafka Connect vs System Production, Event Schematization, Entities and Events, The importance of the Single Source of Truth, Consumption patterns and Event update verbosity. The second major section of this presentation discusses the usage of liberated event data in conjunction with other event streams. In this section we will cover common access patterns, handling (lots) of relational data, Stateful Foreign-Key Joins in Kafka Streams (See Kafka KIP-213), High-frequency updates (price, stock) vs static properties and how to handle too many data streams. The third major section details how to abstract event complexity away, leverage the single source of truth and the usage of Core Events across a company. In this section, we cover abstracting data streams, Core Events as detailed by the Single Source of Truth, Core Events in relation to bounded contexts and using Core Events successfully as a business.

Maintaining Consistency for a Financial Event-Driven Architecture

  • Iago Borges, Nubank

Nubank is leading financial technology in Latin America with a 100% digital banking experience, being recognized as the fastest growing digital banking outside of Asia. Our business aims at fighting the complexity we see in Brazilian banking and empowering people towards their money once again. To successfully deliver an amazing experience for more than 5 million credit card customers and 2.5 million checking account customers, we created a software platform composed by more than a hundred micro-services that are fast and reliable, even when facing unpredictable failures. Every day we accomplish this goal with Apache Kafka as our communication backbone. This talk will detail how we are able to successfully run our platform by applying different patterns and development techniques to create a consistent event-driven design, capable of correcting data processing failures as fast as our business needs to be. We’ll show how patterns like dead letter queue, circuit breakers and back off are applied into the architecture to ensure that failures can be handled as consistently and transparently as possible by engineers across the company. Finally, the talk will also show the set of tools that were created on this architecture to address the concerns about quick fixes of failed events, such as a homegrown CLI capable of inspecting failed events and reprocessing them as needed, all built on top of Apache Kafka.

Making Sense of Your Event-Driven Dataflows

  • Jorge Esteban Quilcate Otoya, SYSCO AS

Contrary to RPC-like applications, where communication and dependencies are explicitly defined between Services; data flowing between Event-Driven Applications are defined by how do they react to and emit events. A trade-off between Data-flow explicitness and Service autonomy becomes apparent between these two architectural styles. The goal in this presentation is to demonstrate how Distributed-Tracing can help to cope with this trade-off, turning messaging exchange between decoupled, autonomous, Event-Driven Services, into explicit Data-flows. Zipkin project brings a Distributed-Tracing infrastructure that enables the collection, processing, and visualization of traces produced by RPC-based, as well as messaging-based applications.

This presentation includes demonstrations on how to enable Tracing for Kafka Streams applications, Kafka Connectors, and KSQL; evidencing how implicit Services behavior and communication through the event-log become can become explicit via Distributed-Tracing. But collecting and visualizing traces is just the first step. In order to create insights from tracing-data, models have to be built to enable a better understanding of the system and improve our operational capabilities. Including research-based experiences from Netflix[1] and Facebook[2] on how tracing-data has been processed and polished with multiple purposes, this presentation will cover how service-dependency analysis and anomaly-detection models can be built on top of it.

Apache Kafka® as an Event Store

  • Guido Schmutz, Trivadis

Event Sourcing and CQRS are two popular patterns for implementing a Microservices architecture. With Event Sourcing, we do not store the state of an object, but instead, store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement? This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?

Streaming Design Patterns Using Alpakka Kafka® Connector

  • Sean Glover, Lightbend

Do you ever feel that your stream processor gets in the way of expressing business requirements? Most processors are frameworks, which are highly opinionated in the design and implementation of apps. Performing Complex Event Processing invariably leads to calling out to other technologies, but what if that integration didn’t require an RPC call or could be modeled into your stream itself? This talk will explore how to build a rich domain, low latency, back-pressured, and stateful streaming applications that require very little infrastructure, using Akka Streams and the Alpakka Kafka connector.

We will explore how Alpakka Kafka maps to Kafka features in order to provide a comprehensive understanding of how to build a robust streaming platform. We’ll explore transactional message delivery, defensive consumer group rebalancing, stateful stages, and state durability/persistence. Akka Streams is built on top of Akka, an asynchronous messaging-driven middleware toolkit that can be used to build Erlang-like Actor Systems in Java or Scala. It is used as a JVM library to facilitate common streaming semantics within an existing or standalone application. It’s different from other stream processors in several ways. It natively supports back-pressure flow control inside a single JVM instance or across distributed systems to help prevent overloading downstream infrastructure. It’s perfect for modeling Complex Event Processing with its easy integration into existing apps and Akka Actor systems. Also, unlike most acyclic stream processors, Akka Streams can support sophisticated pipelines, or Graphs, by allowing the user to model cycles (loops) when there’s a need.

Hard truths about streaming and eventing

  • Dan Rosanova, Confluent

Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event-driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally, we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing but will include brief refresher sections.

Stream Processing

Stream Processing with the Spring Framework (Like You've Never Seen It Before)

  • Tim Berglund, StarTree
  • Josh Long, VMWare

Let’s assume you are eager to refactor your existing monolith, legacy system, or other to-be-deprecated code into a shiny new event-driven system. And let us assume, while you are in the process, that you want to use the Spring Framework as your guiding architectural substrate. These are not just cool things to do—they are measured, well-informed, downright wise architectural decisions with may successful projects behind them.

In this talk, Josh Long and Tim Berglund will live-code their way to a small, functioning, microservices-based web application. We’ll use Spring Boot for the web interfaces, Apache Kafka to integrate the services, and of course Spring for Kafka as the API between the two. If you want to use Spring and Kafka together, apply the right patterns, avoid the wrong antipatterns, and generally get a good glimpse into a solid toolset for building a modern Java web application, look no further. We hope to see you in the session!

Zen and the Art of Streaming Joins - The What, When and Why

  • Nick Dearden, Confluent

One of the most powerful capabilities of both KSQL and the Kafka Streams library is that they allow us to easily express multiple “types” of join over continuous streams of data, and then have those joins be executed in distributed fashion by a self-organizing group of machines—but how many of us really understand the intrinsic qualities of a LEFT OUTER STREAM-STREAM JOIN, SPAN(5 MINUTES, 2 MINUTES)? And what happens when that data can arrive late or out of order? In this talk, we will explore the available streaming join options, covering common uses and examples for each, how to decide between them and illustrations of what’s really happening under the covers in the face of real-world data.

Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kafka Streams

  • Danica Fine, Confluent
  • Alex Leung, Bloomberg L.P.

In a micro-services environment, a clean slate is the ideal starting point for writing Kafka Streams applications. However, the reality is that, in most established organizations, existing legacy systems need to be replaced, while the current client base is maintained. In this talk, we will review our journey in migrating from an existing, legacy system to a more scalable Kafka Streams application, all the while maintaining the stability and integrity of our data for existing customers. We will concentrate on how two technologies – Kafka Connect and KSQL – were instrumental during this migration.

We will cover:
* Our motivation for migrating to KStreams
* How Kafka Connect and KStreams concepts (KTable and GlobalKTable) allowed us to incorporate an external data store into the KStreams app
* How KSQL, using stream-to-stream JOINs, provided a fast and easy way to achieve A/B testing
* Kafka Connect and KSQL internals, pitfalls, and tips

Kafka Streams at Scale

  • Deepak Goyal, Walmart Labs

Walmart.com generates millions of events per second. At WalmartLabs, I’m working in a team called the Customer Backbone (CBB), where we wanted to upgrade to a platform capable of processing this event volume in real-time and store the state/knowledge of possibly all the Walmart Customers generated by the processing. Kafka streams’ event-driven architecture seemed like the only obvious choice.

However, there are a few challenges w.r.t. Walmart’s scale:
• the clusters need to be large and the problems thereof.
• infinite retention of changelog topics, wasting valuable disk.
• slow stand-by task recovery in case of a node failure (changelog topics have GBs of data)
• no repartitioning in Kafka Streams.

As part of the event-driven development and addressing the challenges above, I’m going to talk about some bold new ideas we developed as features/patches to Kafka Streams to deal with the scale required at Walmart.
• Cold Bootstrap: Where in case of a Kafka Streams node failure, how instead of recovering from the change-log topic, we bootstrap the standby from active’s RocksDB using JSch and zero event loss by careful offset management.
• Dynamic Repartitioning: We added support for repartitioning in Kafka Streams where state is distributed among the new partitions. We can now elastically scale to any number of partitions and any number of nodes.
• Cloud/Rack/AZ aware task assignment: No active and standby tasks of the same partition are assigned to the same rack.
• Decreased Partition Assignment Size: With large clusters like ours (>400 nodes and 3 stream threads per node), the size of Partition Assignment of the KS cluster being few 100MBs, it takes a lot of time to settle a rebalance.

Key Takeaways:
• Basic understanding of Kafka Streams.
• Productionizing Kafka Streams at scale.
• Using Kafka Streams as Distributed NoSQL DB

Real Time Streaming Data with Apache Kafka® and TensorFlow

  • Yong Tang, MobileIron

In mission-critical real-time applications, using machine learning to analyze streaming data are gaining momentum. In those applications, Apache Kafka is the most widely used framework to process the data streams. It typically works with other machine learning frameworks for model inference and training purposes. In this talk, our focus is to discuss the KafkaDataset module in TensorFlow. KafkaDataset processes Kafka streaming data directly to TensorFlow’s graph. As a part of Tensorflow (in ‘tf.contrib’), the implementation of KafkaDataset is mostly written in C++. The module exposes a machine learning friendly Python interface through Tensorflow’s ‘tf.data’ API. It could be directly fed to ‘tf.keras’ and other TensorFlow modules for training and inferencing purposes. Combined with Kafka streaming itself, the KafkaDataset module in TensorFlow removes the need to have an intermediate data processing infrastructure. This helps many mission-critical real-time applications to adopt machine learning more easily. At the end of the talk, we will walk through a concrete example with a demo to showcase the usage we described.

Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using Apache Kafka®

  • Chunky Gupta , Mist Systems
  • Osman Sarood, Mist Systems

In this session, we will discuss Live Aggregators (LA), Mist’s highly reliable and massively scalable in-house real time aggregation system that relies on Kafka for ensuring fault tolerance and scalability. LA consumes billions of messages a day from Kafka with a memory footprint of over 750 GB and aggregates over 100 million timeseries. Since it runs entirely on top of AWS spot instances, it is designed to be highly reliable. LA can recover from hours long complete EC2 outages using its checkpointing mechanism that depends on Kafka. This recovery mechanism recovers the checkpoint and replays messages from Kafka where it left off, ensuring no data loss. The characteristic that sets LA apart is its ability to autoscale by intelligently learning about resource usage and allocating resources accordingly. LA emits custom metrics that track resource usage for different components, i.e., Kafka consumer, shared memory manager and aggregator, to achieve server utilization of over 70%. We do multi-level aggregations in LA to intelligently solve load imbalance issues amongst different partitions for a Kafka topic. We’d demonstrate multi-level aggregation using an example in which we aggregate indoor location data coming from different organizations both spatially and temporally. We’d explain how changing partitioning key, along with writing intermediate data back to Kafka in a new topic for the next level aggregators helps Mist scale our solution. LA runs on top of 400+ cores, comprised of 10+ different Amazon EC2 spot instance types/sizes. We track the CPU usage for reading each Kafka stream on all the different instance types/sizes. We have several months of such data from our production Mesos cluster, which we are incorporating into LA’s scheduler to improve our server utilization and avoid CPU hot spots from developing on our cluster. Detailed Blog:https://www.mist.com/live-aggregators-highly-reliable-massively-scalable-real-time-aggregation-system/

KSQL and Security: The Current State of Affairs

  • Victoria Xia, Confluent

As KSQL-users move from development to production, security becomes an important consideration. Because KSQL is built on top of Kafka Streams, which in turn is built on top of Kafka Consumers and Producers, KSQL can leverage existing security functionality, including SSL encryption and SASL authentication in communications with Kafka brokers. However, authentication and authorization between KSQL servers and KSQL clients is a different story. As of December 2018, SSL for communication between KSQL clients and servers is enabled for the REST API, but not yet for the CLI. By April 2019, SSL will be supported in the KSQL CLI, and additional security functionality including SASL authentication, ACLs, audit logs, and RBAC will be in the works as well. This talk will cover the security options available for KSQL, including any new options added by April 2019, and will also include a preview of features to come. Audience members will leave with an understanding of what security features are currently available, how to configure them, current limitations, and upcoming features. The talk may also include common pitfalls and tips for debugging a KSQL security setup.

Use Apache Gradle to Build and Automate KSQL and Kafka Streams

  • Stewart Bryson, Red Pill Analytics

KSQL is an easy-to-use and easy-to-understand streaming SQL engine for Apache Kafka built on top of Kafka Streams. The ability to write streaming applications using only SQL makes Apache Kafka available to a whole range of new developers and potential use cases, either as a stand-alone solution or as a single component to a broader Kafka Streams implementation. Inspired by a customer project now in production, experience the lifecycle of a streaming application developed using KSQL and Kafka Streams. With Apache Gradle as our build framework, we’ll explore the open-source Gradle plugin we built during this project to improve developer efficiency and automate the deployment of KSQL pipelines, user-defined functions, and Kafka Streams microservices.

We’ll demonstrate the deployment process live, and discuss design decisions around incorporating SQL-based processes into an overall streaming application.

Key Takeaways
1. KSQL is a natural choice for expressing data-driven applications, but it may not naturally fit into established DevOps processes and automations.
2. We built an open-source Gradle plugin to handle all aspects of deploying a Kafka-based streaming application: KSQL pipelines, KSQL user-defined functions, and Kafka Streams microservices.
3. KSQL pipelines can be deployed using either a server start script, or the KSQL REST API, and our Gradle plugin fully supports both options.

Use Cases

Building Scalable and Extendable Data Pipeline for Call of Duty Games

  • Yaroslav Tkachenko, Goldsky

What’s easier than building a data pipeline nowadays? You add a few Apache Kafka clusters and a way to ingest data (probably over HTTP), design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse… wait, this looks like a lot of things, doesn’t it? And you probably want to make it highly scalable and available too. Join this session to learn best practices for building a data pipeline, drawn from my experience at Activision/Demonware. I’ll share the lessons learned about scaling pipelines, not only in terms of volume but also in terms of supporting more games and more use cases. You’ll also hear about message schemas and envelopes, Apache Kafka organization, topics naming conventions, routing, reliable and scalable producers and the ingestion layer, as well as stream processing.

Data Transformations on Ops Metrics using Kafka Streams

  • Srividhya Ramachandran, Priceline.com

How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.

Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow

  • Mikhail Chrestkha, Google Cloud
  • Stephane Maarek, DataCumulus

Are you already using Apache Kafka as your primary messaging platform for streaming events? Would you like to extend your streaming platform for machine learning? Join us to learn about building a streaming machine learning pipeline with Kafka, Beam and TensorFlow on Google Cloud Platform using Confluent Cloud, Dataflow and Cloud Machine Learning Engine.

Apache Kafka® Pluggable Authorization for Enterprise Security

  • Anna Kepler, Viasat

At Viasat, Kafka is a backbone for a multi-tenant streaming platform that transports data for 1000 streams and used by more than 60 teams in a production environment. Role-based access control to the sensitive data is an essential requirement for our customers who must comply with a variety of regulations including GDPR. Kafka ships with a pluggable Authorizer that can control access to resources like cluster, topic or consumer group. However, maintaining ACLs in the large multi-tenant deployment can be support-intensive. At Viasat, we developed a custom Kafka Authorizer and Role Manager application that integrates our Kafka cluster with Viasat’s internal LDAP services. The presentation will cover how we designed and built Kafka LDAP Authorizer, which allows us to control resources within the cluster as well as services built around Kafka. We apply our permissions model to our data forwarders, ETL jobs, and stream processing. We will also share how we achieved a stress free migration to secure infrastructure without interruption to the production data flow. Our secure deployment model accomplishes multiple goals: – Integration into an LDAP central authentication system. – Use of the same authorization service to control permissions to data in Kafka as well as services built around Kafka. – Delegation of permissions control to the security officers on the teams using the service. – Detailed audit and breach notifications based on the metrics produced by the custom authorizer. We plan to open source our custom Kafka Authorizer.

Keeping Your Data Close and Your Caches Hotter

  • Ricardo Ferreira, Amazon Web Services

The distributed cache is becoming a popular technique to improve performance and simplify the data access layer when dealing with databases. Bringing the data as close as possible to the CPU allows unparalleled execution speed as well as horizontal scalability. This approach is often successful when used in a microservices design in which the cache is accessed only by a single API. However, it becomes more challenging if multiple applications are involved and changes are made to the database directly by other applications. The data held in the cache eventually becomes stale and no longer consistent with its underlying database. When consistency problems arise, the Engineering team must address that through additional coding — which directly jeopardizes the team’s ability to be agile between releases. This talk presents a set of patterns for cache-based architectures that aim to keep the caches always hot; by using Apache Kafka and its connectors to accomplish that goal. It will be shown how to set up these patterns across different IMDGs such as Hazelcast, Apache Ignite or Coherence. These patterns can be used in conjunction with different cache topologies such as cache-aside, read-through, write-behind, and refresh-ahead, making it reusable enough to be used as a framework to achieve data consistency in any architecture that relies on distributed caches.

From a Million to a Trillion Events Per Day: Stream Processing in Ludicrous Mode

  • Shrijeet Paliwal, Tesla

In this talk, we’ll describe the evolution of stream processing at Tesla and the challenges that are specific to our needs, such as large skews in message-processing latencies. We’ll describe how we built a reliable and performant ingestion platform that allows us to take an idea from a whiteboard to production in just a matter of hours. We’ll also discuss the design principles, tools, and incident response processes that have enabled a small team to support Kafka and downstream services in highly-available and multi-tenant environments at scale.

How To Use Apache Kafka® and Druid to Tame Your Router Data

  • Rachel Pedreschi, Imply Data

Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.

In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:

Network flow use cases and why this data is important.
Reference architectures from production systems at a major international Bank.
Why Kafka and Druid and other OSS tools for Network Flows.
A demo of one such system.

Streaming Ingestion of Logging Events at Airbnb

  • Hao Wang, Airbnb

Logging is a critical piece of Airbnb infrastructure. Massive amount of logging events is emitted to Kafka and ingested continuously to the data warehouse for offline analytics. Logging data drives critical business and engineering functions and decision-making such as building search ranking models, experimentation (A/B testing), and identifying fraud and risks. At Airbnb logging events are published to Kafka from services and clients. The logging events are then ingested from Kafka to the data warehouse in near real-time using Airstream (a product built on top of Spark streaming). With rapid increase of the logging traffic, we have seen some serious engineering challenges in the scalability and reliability of the streaming ingestion infrastructure. For example, we cannot simply increase the parallelism of Spark reading from Kafka or throw more resources at it when event rate spikes. Another serious problem is large skew in event size – some events are much bigger than others. The out-of-box Kafka reader in Spark treats all topics equally so running time of tasks in a stage could vary a lot due to event size, which results in huge computing inefficiency and lag in a streaming job.

In this talk, I will start with an overview of the logging architecture at Airbnb. I will then dive into the emerged scalability and reliability issues and share how we scaled the infrastructure to handle 10X growth of logging traffic.

The key takeaways include

  1. an architecture of near real-time and large-scale logging event ingestion from Kafka using Spark streaming;
  2. limitation of Kafka consumer in Spark and a more scalable solution;
  3. lessons learned from productionizing a critical real-time data infrastructure.