Identifying the key problem in the system and proposing a solution
Data mining using state-of-the-art methods
Identify third party data that can be used to enhance the information gained
Processing, cleansing, and verifying the integrity of data used for analysis
• Strong Python coding.
• Experience in Advance SQL
• Experience in AWS Cloud services like Redshift, S3, Ec2, RDS, VPC
• Experience in GCP Cloud services like Compute Engine, App Engine, Cloud SQL, Cloud Storage
• Hands on experience on Talend ETL Tool
• Strong hands-on Linux and Windows environment for troubleshooting existing pipeline issues
• Experience on AWS EMR Clusters or Google Data Proc clusters ( will be an advantage)
Other Skills & Qualifications:
Proficiency in SQL is a must.
Experience/knowledgeable in Big Data architectures.
Designing and constructing a highly scalable data management system
Experience with common data science toolkits, such as R, Python, Java.
Hands-on experience in building up real-time data pipeline or batch-based data pipelines
Experience in building software components and analytics applications
Experience with NoSQL databases, such as MongoDB, Cassandra, HBase, Redis is an added bonus.
Experienced in data mining exercises for business insights.
4+ years of experience in Data Engineering and Data Warehousing
Experienced in Big Data, Relational Data and Unstructured Databases
Data-oriented personality with ability to logically breakdown data problems and find solutions
Highly curious, self-starter and can work with minimum supervision and guidance.
Bachelor or Master degree in computing domains
Great communication skills