Mandatory skills:
- 5+ years of experience in programming with python. Strong proficiency in python
- Familiarity with functional programming concepts
- 3+ years of Hands on experience in developing ETL data pipelines using pyspark on AWS EMR
- Hands on experience of XML processing using python
- Good understanding of Spark’s RDD API
- Good understanding of Spark’s Dataframe and API
- Experience in configuring EMR clusters on AWS
- Experience and good understanding of Apache Spark Data sources API.
- Experience of dealing with AWS S3 object storage from Spark.
- Experience in trouble shooting spark jobs. Knowledge of monitoring spark jobs using Spark UI
- Performance tuning of Spark jobs.
- Understanding fundamental design principles behind a business processes
Nice to have skills:
- Knowledge of AWS SDK CLI
- Experience of setting up continuous integration/deployment of spark jobs to EMR clusters
- Knowledge of scheduling spark applications in AWS EMR cluster.
- Understanding the differences between Hadoop Mapreduce and Apache Spark
- Proficient understanding of code versioning tools as Git, SVN
- Design, development and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark on AWS EMR
- Writing reusable, testable, and efficient code
- Integration of data storage solutions in spark – especially with AWS S3 object storage.
- Performance tuning of pySpark scripts.
- Need to ensure overall build delivery quality is good and on time delivery is done at all times.
- Should be able to handle meetings with customers with ease.
- Need to have excellent communication skills to interact with customer.
- Be a team player and willing to work in an onsite offshore model, mentor other folks in the team (onsite as well as offshore)
Send resume to adelina@jobs-n-jobs.com
Comments