
AWS Certified Data Engineer – Associate
Skills learned and applied: AWS Glue, Athena, Redshift, DynamoDB, EC2, Amazon Kinesis and many other AWS Data Engineering services
Cloud Data Engineer
I am Vedavyas. Thanks for dropping by my portfolio site!
As a passionate Data Engineer, I bring extensive experience in building scalable, efficient data pipelines and architectures that support complex data analysis and machine learning models. Skilled in a wide range of technologies including Python, SQL, Apache Spark, and Hadoop, I excel in designing and implementing robust solutions for big data challenges. My expertise extends to cloud platforms such as AWS and Azure, where I've deployed and managed data lakes and warehouses, optimizing for performance and cost-effectiveness.
As I recently completed my graduation in May 2025 from the prestigious Indiana University Bloomington, I am enthusiastically preparing to embark on my professional journey in the dynamic field of Cloud Data Engineering. The experiences I've gained thus far have honed a diverse skill set that I am eager to leverage in tackling real-world challenges within the ever-evolving tech landscape. Additionally, I am open to relocation. If you are in search of a dedicated and motivated Cloud/Data engineer, poised to learn and make a substantial impact, let's establish a connection!
Technical Skills:
Programming & Databases: Python, R, MySQL, PostgreSQL, MongoDB, Snowflake, Amazon DynamoDB, Apache Cassandra, PrestoDB, Neo4j Graph Database
Cloud & Big Data Technologies: AWS Glue, Amazon Redshift, Amazon Athena, AWS EMR, Amazon S3, Azure Databricks, Azure Data Factory, Azure Synapse
Analytics, Azure Event Hubs, Google BigQuery, Google Cloud Storage
Data Engineering & DevOps: Apache Spark, Apache Hadoop, Apache Kafka, Apache Hive, Apache Airflow, PySpark, Pandas, NumPy, dbt (data build tool), Docker,
Kubernetes, Jenkins, Git, Celery, ETL/ELT Pipelines, Data Modeling, Spark Streaming
Business Intelligence & Analytics: Tableau, Microsoft Power BI, Amazon QuickSight, scikit-learn, TensorFlow, PyTorch, Random Forest, XGBoost, SVM, Linear
Regression, Logistic Regression, Hypothesis Testing, Dimensional Modeling, Statistical Analysis
Web Development & Tools: Django, Flask, Linux, React.js, HTML5, CSS3, JavaScript, Postman, Redis, RESTful APIs, FastAPI
Coursework: Bigdata Applications, Advance Database Concepts, Applied Algorithms, Applied Database Techniques, Software Engineering, Computer Networks, Data Mining
Coursework: Data Structures, Database Management Systems, Computer Organization and Architecture, Operating Systems, Cloud Computing, Linux Programming
I love doing certifications. Below are a few that I pursued so far. I am parallelly working on a couple more while you are reading this information.
Skills learned and applied: AWS Glue, Athena, Redshift, DynamoDB, EC2, Amazon Kinesis and many other AWS Data Engineering services
Skills learned and applied: Apache Spark, Delta Lake, Databricks, Lakehouse, Delta Live Tables, Data Pipelines, ETL, Production, SQL, Python
Skills learned and applied: Prepare the data, Model the data, Visualize and analyze the data
Skills learned and applied: Apache Airflow, DAGs, Data Pipelines, Orchestration, Scheduling
Skills learned and applied: Spark architecture, Spark SQL functions, UDFs, DataFrames, Adaptive query execution, Python
Skills learned and applied: Architecting Solutions On AWS AWS Cloud Best Practices Building Infrastructure On AWS
Skills learned and applied: Cloud concepts, Azure architecture and services, Azure management and governance, Describe core data concepts, Identify considerations for relational data on Azure, Describe considerations for working with non-relational data on Azure, Describe an analytics workload on Azure
Individuals who successfully complete the Confluent Fundamentals Accreditation have an understanding of Apache Kafka and Confluent Platform. Users are able to: explore use cases, have general knowledge of Kafka’s core concepts, understand the ability of Kafka as a highly scalable, highly available, and resilient real-time event streaming platform.
Skills learned and applied: Architecting Solutions On AWS AWS AWS Academy AWS Cloud Building Infrastructure On AWS Web Applications
Skills learned and applied: AWS Architecture AWS Cloud AWS Core Services AWS Pricing AWS Support
Below are a few hands-on projects that I worked on.
This project focuses to avoid identity theft, which detects any unusual activity using credit card, which has skyrocketed in the current era. I've used AWS S3 bucket, AWS EMR, Hive, Hadoop, MongoDB, PySpark and Apache Kafka to do this project. I've attached all the necessary files to perform this project in the Media section. Following are the checks performed in my project to detect frauds in Credit Card Transactions: The Transaction would be considered as fraudulent if the Transaction Amount exceeds the Upper Control Limit (UCL) considering the last 10 transactions of the card The Transaction would be considered as fraudulent if the Credit Score is less than 200 The geo location of each transaction is captured and the distance and time is identified for them. Considering if the time taken between two transactions does not exceed the speed limit of 900 Km/hr based on the distance, the current transaction would be treated as genuine, if not, as fraudulent.
Skills learned and applied: AWS(S3, EMR), Hadoop, Hive, MongoDB, PySpark, Apache Kafka
This project involves building an end-to-end data pipeline for Spotify data using AWS services. It encompasses extracting data from the Spotify API, storing it in AWS S3, and implementing automated transformation processes using AWS Lambda. The pipeline includes scheduled data extraction, data cleaning and formatting, and automated triggers for transformation based on data updates. The transformed data is then stored back in S3 with proper organization. The project also leverages AWS Glue and Athena for creating analytics tables and enabling efficient querying. Key skills learned from this project include working with AWS services (S3, Lambda, Glue, Athena), API integration, data extraction and transformation, automated pipeline development, cloud-based data storage and organization, and setting up data analytics infrastructure. This comprehensive solution provides a scalable and automated approach to processing and analyzing Spotify data, offering valuable insights for various analytical purposes.
Skills learned and applied: Application Design, Python, Infrastructure As Code, Amazon Web Services, AWS Elastic Cloud Compute, AWS Relational Database Service, AWS Simple Storage Service
Retaining current employees is more difficult for the HR team than recruiting new ones. Any business that loses one of its valuable employees suffers a loss in terms of productivity, time, money, and other factors. This loss could be reduced if HR could able to foresee future employees who were considering leaving their positions; as a result, we looked into ways to address the employee turnover issue from a machine learning perspective through this project. When the time comes to lay off workers as part of organizational changes, the corporation can use churn modeling to make a rational decision rather than randomly selecting layoff candidates.
Skills learned and applied: Data collection, Data Mining, Data Pre Processing, Python, Pandas, Exploratory Data Analysis, Dimensionality Reduction, Machine Learning algorithms, Tuning ML model performance
I had the privilege of working on a transformative project in which I oversaw the implementation of Tableau for in-depth data analysis of our Human Resources data. Every company has an HR Department that handles various recruitment and placement tasks. In this project, I worked with a massive dataset to extract valuable insights. These insights can be really helpful for the HR department to improve their work and gain a better understanding of the recruitment process in the market. These dashboards empowered with the insights needed for informed decision-making, allowing to steer businesses in the right direction and drive substantial improvements in overall performance.
Skills learned and applied: Data Collection, Data Cleaning, Data Transformation, Data Modeling, Data pre-processing, Information Visualization, Business Intelligence, Tableau
If you have any potential opportunity for me, or just want to get
in touch, please feel free to drop me an email. You can also reach
out to me on LinkedIn from the bottom-right corner of this page.
I look forward to connecting with other industry
professionals, sharing knowledge, and exploring new opportunities
in the data space. Because in this realm of information, a warm
welcome is just a JOIN operation away. 😊