> tech talk by ashish
all posts
2 min read

Spark Structured Streaming with Kafka @ Qubole, Bangalore

Big DataSparkKafkaEventsArchive

An Event to Remember! Yes, that's how much special "Spark Structured Streaming with Kafka" event was for me. Without much of a delay let me dive deep into the TOPICS I liked about the event and what made the event so special for me and also for many other Big Data Enthusiast Present at the Venue.

The event started on the Saturday morning at 10 O'Clock sharp, at the Qubole Office in HSR Layout, Bangalore. For those who do not know about 'Qubole', it's the company started by 2 Indians(Ashish Thussoo and Joydeep Sen Sarma) who worked in Facebook on the leading project developing Apache Hive. Qubole currently have offices in Santa Clara, California, USA and Bangalore, India. Qubole is a cloud-based agnostic company with Big-Data as a service provider. That's about the Company. Moving on to the speakers note in the meetup.

'Spark Structured Streaming with Kafka' well the name to the event was precisely justified and the topic covered there were to the point, hands on, real-time and simple scenarios based examples were picked up for the event. Topics Covered at the meetup :

  1. Structured Streaming with Kafka by Sashidhar

Attention to details were given to ensure that the examples taken for demonstration served the purpose of the event, which was making the Big Data Enthusiasts understand the complicated theories with simple examples. Sashidhar covered various topics such as

(i) Data Collection Tools Used in the Industry such as LogStash, Fluentd

(ii) Data Ingestion Tools such as RabbitMQ known for its data ingestion reliability on single machine

(iii) Spark vs KAFKA Compatibility

(iv) Some good easy to understand examples explaining the detailed concept of Streaming using KAFKA and Spark using SCALA language

(v) KAFKA Sink

(vi) Check-pointing and it's importance

(vii) What's coming in the future version of Spark

  1. Optimising S3 Write-heavy Spark Worload by Bharath Bhushan

(i) Some really helpful insight on writing/Storing the results to S3 Storage in an optimized manner were covered.

(ii) DirectFileOutputCommiter (DFOC)

(iii) Solutions and optimization achieved using parallelism in Spark over the years @ Qubole were well explained

The Event overall was a treat for all the Big Data Fans out there as several amazing question answer discussion took places, knowledge were shared across, it really felt more of a Community than a Company event. The meetup was well organised and the icing on the cake was done with Qubole Team serving a sumptuous Lunch with Drinks for the guests. And a T-shirt as a take-away. Thanks Qubole team for putting up such an amazing show. Would be glad to be part of many more such events.

You May Reach me at : https://www.linkedin.com/in/ashish-vishwakarma-56023a2b/