
Hi, I am Raj Kumar. I am a Cloud AI & Data Engineer with 7+ years of experience. I architect secure cloud foundations and build intelligent data solutions, bridging the gap between Robust Data Engineering & Generative AI.
About Me
Cloud, AI & Data Engineer with 7+ years of experience building secure, scalable data platforms within the Microsoft Azure ecosystem and AWS. I specialize in turning raw, complex data into trusted, analytics-ready assets that enterprises can act on with confidence.
My core strength is end-to-end data engineering. I design and orchestrate pipelines using Azure Data Factory, Databricks, Synapse, and Microsoft Fabric, applying Medallion Architecture and Delta Lake to build lakehouses that are clean, performant, and governed. My daily toolkit includes PySpark, Spark SQL, Python, and SQL — paired with Snowflake for multi-cloud flexibility and Terraform for infrastructure-as-code deployments. I've built systems that handle real-time streaming with Kafka and deliver business insights through Power BI.
What sets me apart is that I don't stop at pipelines. I extend data platforms into intelligent systems using Azure AI Foundry and RAG-based architectures, building production-grade GenAI solutions that sit on top of the reliable data foundations I've already engineered.
Having recently graduated with my Master's in Computer Science (Dec 2025) from Concordia University Chicago, I'm open to relocation and ready to bring this full-stack data engineering perspective to a team solving hard problems at scale.
Outside of work, I'm a morning guy — I enjoy playing soccer, the gym, watching movies, and spending time with my friends. I also love making coffee and lattes.
My projects
DisputeAI
An intelligent financial dispute resolution system using Agentic AI and RAG on Azure to automate case analysis and generate citation-backed recommendations.
- Azure AI Foundry
- Azure OpenAI
- RAG
- Azure AI Search
- LangChain
- Agentic AI
- Python
Snowflake AI Sales Insights
An AI-powered sales analytics platform using Snowflake Cortex Analyst with a natural-language Streamlit chat interface for querying live sales data.
- Snowflake
- Cortex Analyst
- Streamlit
- Snowpark
- Python
- SQL
- Data Science
Credit Fraud Detection
A production-grade credit card fraud detection ML pipeline with PySpark feature engineering, model training, and a Power BI monitoring dashboard.
- Azure Databricks
- PySpark
- Scikit-learn
- XGBoost
- Azure ML
- MLflow
- Power BI
Bicycle Sales Analytics
An end-to-end data pipeline with Snowflake data warehousing and Sigma Computing dashboards, following Medallion architecture for bicycle sales and accessories data.
- Snowflake
- SQL Server
- Docker
- Sigma Computing
- Star Schema
- Medallion Architecture
Earthquake Analysis
A data engineering platform analyzing global earthquake data using Microsoft Fabric Lakehouse and Azure Databricks with interactive Power BI dashboards.
- Microsoft Fabric
- Azure Databricks
- Delta Lake
- PySpark
- Python
- Pandas
- Power BI
AWS VPC & EC2 Infrastructure
A hands-on AWS networking project demonstrating custom VPC setup with public/private subnets, NAT Gateway, Bastion Host, and secure EC2 access patterns.
- AWS VPC
- EC2
- NAT Gateway
- Bastion Host
- Security Groups
- Route Tables
DITA
Data is ingested from an on-premises, transformed using data engineering tools, and analyzed through visualization tools.
- MS SQL Server
- Azure Data Lake
- Data Factory
- Databricks
- Synapse Analytics
- Power BI
Product Sales Analytics
An interactive Power BI report leveraging the AdventureWorks database for sales performance through data visualization.
- Power Query
- Power BI
- M language
- DAX
Supply Chain Analytics
An end-to-end analytics pipeline on Azure Databricks processing supply chain and sales data using Medallion architecture with Delta Lake.
- Databricks
- PySpark
- SQL
- Delta Lake
- Time Travel
- Multi Hop
- Unity Catalog
My skills
- Azure
- AWS
- Microsoft Fabric
- Snowflake
- Databricks
- Azure Data Factory
- Azure Synapse Analytics
- Delta Lake
- Medallion Architecture
- ETL
- Data Warehousing
- Data Modelling
- Spark
- PySpark
- Spark SQL
- Kafka
- Hadoop
- Hive
- Azure OpenAI / GenAI
- Azure AI Search
- MLflow
- Python
- SQL
- T-SQL
- Scala
- MS SQL Server
- PostgreSQL
- Power BI
- Tableau
- Data Visualization
- Docker
- Kubernetes
- Terraform
- Azure DevOps
- GitHub Actions
- Jenkins
- CI/CD
- Git
- Github
- Agile Methodologies
- JIRA
My experience
Discover Financial Services
Senior Cloud Data Engineer
Chicago, IL, USA
- Led modernization of enterprise analytics platform by implementing a Microsoft Fabric Lakehouse with Medallion architecture, unifying siloed financial datasets into a governed single source of truth for BI and ML.
- Designed fault-tolerant, metadata-driven ingestion pipelines using Azure Data Factory and Fabric Data Pipelines with incremental loads and watermarking, maintaining 99.7%+ SLA across 15+ source systems.
- Developed PySpark and Spark SQL transformations in Azure Databricks and Fabric Notebooks, reducing processing time by ~40% through Delta Lake optimization, Z-order indexing, and partition pruning.
- Delivered a production-grade RAG solution using Azure OpenAI (GPT-4) + Azure AI Search, reducing manual document lookup effort by ~60% for compliance and support teams.
- Built Power BI dashboards on Fabric Warehouse semantic models to monitor dispute volumes, fraud detection rates, and credit portfolio performance with row-level security and drill-through capabilities.
Concordia University Chicago
Cloud AI & Data Engineer
Chicago, IL, USA
- Architected a production-grade Service Desk Copilot using Azure AI Foundry and RAG (Retrieval-Augmented Generation), reducing ticket volume by delivering citation-backed answers from internal runbooks.
- Engineered automated document processing workflows using Azure AI and JSON parsers to extract key data fields from unstructured finance documents for downstream reporting.
- Developed comprehensive Power BI dashboards to visualize operational KPIs, utilizing DAX and Power Query to identify trends in system usage and support efficiency.
- Secured cloud infrastructure by implementing Role-Based Access Control (RBAC) and policy governance within Microsoft Entra ID for faculty and staff systems.
Concordia University Chicago
Data Operations & Cloud Analyst
Chicago, IL, USA
- Optimized university IT workflows by analyzing system log data using SQL and Power BI, identifying bottlenecks in the ticketing lifecycle.
- Managed Azure Active Directory (Entra ID) user identities and access policies, ensuring 99.9% uptime for student and faculty portal access.
- Collaborated with cross-functional teams to migrate on-premise data to cloud storage, validating data integrity through SQL scripting and automated quality checks.
- Created automated reporting scripts using PowerShell and Python to track license usage and cloud resource consumption, reducing operational waste.
LTIMindtree LTD. (Microsoft Vendor)
Senior Data Engineer
Hyderabad, India
- Architected metadata-driven ingestion frameworks using Azure Data Factory, orchestrating data movement across ADLS Gen2, Synapse Analytics, and Snowflake for insurance and Xbox sales domains.
- Designed dimensional data models (star/snowflake schemas) with SCD Type 1/2 in Azure Synapse and Snowflake, enabling tracking of claims efficiency, sales velocity, and regional revenue.
- Developed event-driven processing solutions using Azure Event Hubs and Stream Analytics, reducing reporting latency from hours to under 15 minutes for time-sensitive business decisions.
- Implemented comprehensive data quality frameworks including source-to-target validation, schema conformance checks, and duplicate detection, reducing data-related production incidents by ~40%.
- Managed platform security using Azure Key Vault, implemented row-level security in Synapse Analytics, and configured RBAC across ADLS Gen2 to comply with enterprise governance standards.
Mindtree (Microsoft Vendor)
Data Engineer
Mumbai, India
- Engineered 30+ scalable ETL/ELT pipelines using Azure Data Factory, processing ~5+ TB of transactional data daily with a 99.5% pipeline success rate across insurance and Xbox sales domains.
- Built PySpark and Spark SQL transformations on Azure HDInsight and Synapse Spark pools, improving data processing throughput by ~35% through partition pruning, broadcast joins, and caching.
- Built enterprise-grade data ingestion from SQL Server, MySQL, APIs, JSON, and Kafka into Bronze/Silver/Gold zones within ADLS Gen2 following medallion architecture with Delta Lake.
- Created business-facing datasets and reporting feeds consumed by Power BI and Tableau dashboards, collaborating with analysts to translate business requirements into technical designs.
- Managed CI/CD deployment practices using Azure DevOps and Jenkins across dev, QA, staging, and production environments with ARM template parameterization and release gate approvals.
Bosch
Data Engineer Intern
Bangalore, India
- Engineered end-to-end IoT telemetry ingestion pipelines using Kafka producers/consumers in Python and Scala on AWS, enabling real-time streaming of high-frequency industrial sensor data.
- Developed Spark Streaming applications on AWS EMR to process raw telemetry events, persisting to HBase for operational lookups and S3 data lake zones for batch analytics.
- Implemented AWS Kinesis with Lambda functions for real-time anomaly detection, triggering SNS alerts when sensor thresholds were breached — reducing incident response time to near real-time.
- Built PySpark batch jobs on EMR to process terabytes of historical IoT sensor datasets, performing time-series aggregations to support predictive maintenance analytics.
- Designed dimensional data models in AWS Redshift for machine performance metrics, enabling stakeholders to track equipment efficiency and downtime patterns through BI dashboards.
My Education
Swami Vivekananda Institute of Technology
Hyderabad, India
Bachelor of Technology in Electronics and Communication Engineering. I immediately found a job as a Data Engineer.
2020Concordia University Chicago
River Forest, IL
Graduated with Master's Degree, Computer Science.
Aug 2023 - Dec 2025My Certifications











Contact me
Please contact me directly at manalarajkumar.rm@gmail.com or through this form.








