> tech talk by ashish
> senior_data_engineer

Hi, I'm Ashish Vishwakarma.I build data products that help businesses make smart decisions._

Senior Data Engineer on Global Payments at TikTok, based in San Jose. Over a decade building distributed pipelines and lakehouse platforms across payments, semiconductors, insurance, and telecom.

Experience

TikTokIntelAccenture

AT&T · Travelers Insurance engagements at Accenture

10+
years experience
4
industries
100+ GB
daily pipeline data
10x
efficiency gains
> about

A bit about me

I'm a Senior Data Engineer with over a decade of experience building production-grade data platforms: distributed pipelines, lakehouse architectures, and analytics-ready data products. My work has spanned four very different industries: payments, semiconductors, insurance, and telecom.

What ties it together is ownership. I like taking a business-critical dataset and owning it end to end: ingestion, transformation, data modeling, quality, SLA observability, performance tuning, and the analytics that stakeholders actually rely on.

Right now I'm focused on Global Payments data intelligence at TikTok: SLA monitoring, payments reporting, AI/LLM data preparation, and modernizing legacy big-data systems into scalable cloud and lakehouse platforms.

> experience

Where I've worked

A decade of building data systems across payments at TikTok, semiconductors at Intel, and insurance and telecom at Accenture.

TikTok USDS

Senior Data Engineer, Global Payments

Jan 2026 - Present  ·  San Jose, CA

Continuation of the Global Payments data engineering role below. The same team and responsibilities transferred into TikTok U.S. Data Security (USDS), the U.S. data-security joint venture, in January 2026.

PySparkYARNPayments DataRefund & Payout

ByteDance (TikTok)

Senior Data Engineer, Global Payments

Dec 2024 - Jan 2026  ·  San Jose, CA

  • Built and own the single source of truth for Refund and Payout data across TikTok Shop, TikTok Ads, and other non-TikTok apps, consolidating fragmented payment flows into trusted, business-critical datasets used by analytics, product, and operations teams.
  • Designed payments data models and pipelines that power reporting, AI/LLM data preparation, PSR reporting, and campaign recap analytics across the Global Payments domain.
  • Optimized high-priority task execution by migrating critical Dorado workloads to a dedicated YARN queue, reducing resource contention and improving reliability for time-sensitive pipelines.
  • Implemented an SLA monitoring and alerting framework for core payment datasets using Aeolus and internal metadata, shifting reliability from reactive incident response to proactive monitoring.
  • Partner with upstream e-commerce, product, and data platform teams to improve trust in payments datasets across US-TTP, ROW, and related regional flows.
PySparkYARNPayments DataRefund & PayoutAeolus

Intel

Cloud Software Engineer

Nov 2021 - Oct 2024  ·  Folsom, CA

  • Designed and implemented Intel's foundational telemetry data model, scalable to 100+ GB of daily data, serving as the base layer for two analytics data marts and Superset dashboards.
  • Built end-to-end AWS data pipelines following medallion architecture across ingestion, transformation, curated modeling, and analytics consumption.
  • Designed hyperscaler customer pipelines using AWS Glue, Lambda, Kinesis Data Streams, Step Functions, S3, and Redshift to process 100+ packages concurrently.
  • Led AI dataset reconciliation pipelines for pre-silicon data, collaborating with teams in China, Poland, and India to curate datasets for data science using PySpark and Databricks.
  • Processed unstructured PDF/HTML data with Unstructured.io to build high-quality datasets for LLM fine-tuning, an approach adopted by multiple Intel teams.
  • Led migration from legacy Hive tables to Apache Iceberg, enabling lakehouse capabilities: time travel, faster queries, schema evolution, and reliable historical data management.
  • Led design and code reviews focused on AWS cost optimization and Spark/Python performance tuning across Glue jobs, RDS workloads, and Kinesis streams.
AWS GlueLambdaKinesisRedshiftPySparkDatabricksApache Iceberg

Accenture

· Travelers Insurance

Data Engineer Tech Lead

May 2021 - Oct 2021  ·  Hartford, CT

  • Led a team of 5 data engineers migrating 3 big data projects from on-prem Hadoop to AWS, improving scalability and cloud readiness for insurance analytics workloads.
  • Designed and built an advanced analytics pipeline processing 10+ GB of business insurance data using PySpark on Databricks, AWS, and Snowflake.
  • Directed a SAS-to-PySpark migration for legacy analytics with 1:1 validation, improving processing efficiency 10x.
  • Optimized critical PySpark pipelines for international insurance reporting, cutting runtime from 3 hours to 20 minutes through Spark tuning and refactoring.
PySparkDatabricksAWSSnowflake

Accenture

· Travelers Insurance

Senior Data Engineer

Aug 2019 - May 2021  ·  Hartford, CT

  • Built and maintained production pipelines ingesting from Teradata, SQL Server, and Oracle using Sqoop and PySpark on YARN for pre-issuance insurance analytics.
  • Reduced PySpark job runtime from 2 hours to 15 minutes by optimizing executor configuration, file distribution, partitioning, and parallelism.
  • Developed reusable PySpark utilities for parsing complex XML and JSON, improving consistency and speed of semi-structured ingestion.
  • Partnered with technical architects and business teams to improve data interfaces, pipeline performance, and reliability of analytics-ready data.
PySparkSqoopYARNTeradata

Accenture

· AT&T

Senior Software Engineer

Dec 2016 - Jul 2019  ·  Bangalore, India

  • Created big data proofs of concept using Hadoop, Hive, Spark, and Sqoop to demonstrate scalable ingestion and transformation for telecom requirements.
  • Re-engineered a critical ingestion pipeline processing 100+ GB of data, improving processing efficiency and reliability.
  • Authored complex Hive SQL transformations with embedded business logic and optimized query patterns for large-scale telecom analysis.
  • Used HBase and Apache Phoenix for NoSQL storage and fast-retrieval use cases.
HadoopHiveSparkSqoopHBase

Accenture

· AT&T

Software Engineer / Associate Software Engineer

Nov 2013 - Nov 2016  ·  Bangalore, India

  • Implemented Sqoop-based ingestion from legacy source systems into the data lake and built Oozie workflows to automate recurring Hadoop tasks.
  • Built ETL with Revenue Assurance ETL tooling and automated UNIX shell scripts for duplicate detection, data validation, and operational alerting.
  • Delivered Hadoop, Hive, and HBase training sessions to grow team capability.
HadoopSqoopOozieHiveHBase
> skills

Tools & technologies

The stack I use to build, run, and keep data platforms reliable.

Data Engineering

  • Python
  • Advanced SQL
  • PySpark
  • Apache Spark
  • Hive
  • Hadoop
  • HDFS
  • ETL / ELT
  • Batch & incremental processing
  • Data reconciliation

Lakehouse & Warehousing

  • Apache Iceberg
  • Databricks
  • Snowflake
  • Redshift
  • Medallion architecture
  • Schema evolution
  • Partitioning strategy
  • Dimensional modeling

Cloud & Orchestration

  • AWS Glue
  • S3
  • Lambda
  • Step Functions
  • Kinesis
  • CloudWatch
  • EventBridge
  • Databricks Workflows
  • Airflow
  • Oozie

Streaming & Distributed Systems

  • Kafka
  • Kinesis Data Streams
  • Event-driven ingestion
  • Distributed processing
  • YARN
  • Resource optimization
  • Workload tuning

Data Reliability & Governance

  • SLA monitoring
  • Data observability
  • Data quality checks
  • Alerting
  • Incident triage
  • Lineage-aware debugging
  • Production support

DevOps & Platform

  • Terraform
  • Docker
  • Kubernetes
  • GitHub Actions
  • CI/CD
  • Git
  • Prometheus
  • Grafana

Databases & NoSQL

  • PostgreSQL
  • Oracle
  • SQL Server
  • Teradata
  • HBase
  • MongoDB
  • Cassandra
  • DynamoDB
  • Apache Phoenix

AI / LLM Data

  • Unstructured data processing
  • PDF / HTML parsing
  • AI-ready datasets
  • Training data preparation
  • Dataset validation
> projects

Selected work

A few projects that show what I build and the impact they had.

TikTok

Payments Single Source of Truth

Built the single source of truth for Refund and Payout data across TikTok Shop, TikTok Ads, and other non-TikTok apps, consolidating fragmented payment flows into trusted, business-critical datasets that analytics, product, and operations teams rely on.

PySparkData ModelingPayments DataRefund & Payout
Intel

Hive → Apache Iceberg Lakehouse Migration

Migrated legacy Hive tables to Apache Iceberg, unlocking lakehouse capabilities: time travel, schema evolution, faster queries, and reliable historical data management.

Apache IcebergHiveAWSLakehouse
Intel

Telemetry Data Platform

Designed Intel's foundational telemetry data model scaling to 100+ GB/day, serving as the base layer for two analytics data marts and Superset dashboards used across analyst and engineering teams.

AWS GlueKinesisRedshiftMedallion Architecture
Intel

AI / LLM Data Preparation Pipelines

Processed unstructured PDF and HTML data with Unstructured.io to build high-quality datasets for LLM fine-tuning. The approach was adopted by multiple Intel teams for AI data preparation.

Unstructured.ioPySparkDatabricksLLM Data
Accenture

SAS-to-PySpark Modernization

Directed the migration of legacy SAS analytics workloads to PySpark on AWS and Databricks for Travelers Insurance, with 1:1 output validation. The modernization improved processing efficiency 10x and cut long-term dependency on legacy codebases.

PySparkAWSDatabricksSAS Migration
> conferences

Conferences & community

Recent talks, conferences, and events I've been part of, straight from my LinkedIn.

Stripe Sessions
DevWorld Conference, DeveloperWeek
Widen the Window
AI & Big Data Expo 2025
> what i bring

What I bring to a team

Platform modernization

A proven record migrating legacy Hadoop, Hive, and SAS workloads to AWS, Databricks, Snowflake, and Iceberg lakehouses.

Reliability engineering

I build SLA monitoring, alerting, and observability for business-critical production data pipelines.

Performance obsession

I've repeatedly cut pipeline runtimes dramatically: 3 hours to 20 minutes, 2 hours to 15 minutes, 10x efficiency gains.

Cross-domain range

Delivered across payments, semiconductors, insurance, and telecom. I ramp fast in unfamiliar domains.

> recognition

Recognition & education

Awards

  • ACE Award for Delivery Excellence, Accenture (FY16)
  • ACE Award for Delivery Excellence, Accenture (FY19)

Education

Bachelor of Engineering, Computer Science

Thakur College of Engineering & Technology

Mumbai, India

> contact

Get in touch

Have a question, an idea, or just want to talk data? Drop me a line.