Responsibilities:- Design, build, and maintain data pipelines using modern cloud platforms.
- Develop robust ETL/ELT workflows for structured and unstructured data.
- Optimize data storage, query performance, partitioning, and retrieval processes for large-scale datasets.
- Ensure data quality through validation checks, reconciliation, profiling, and automated monitoring.
- Collaborate with data analysts, business teams, and data scientists to understand functional requirements.
- Implement data governance, metadata management, lineage tracking, and compliance standards.
- Build integrations with APIs, databases, and third-party platforms.
- Perform performance tuning for SQL queries, Spark jobs, and distributed processing workloads.
- Develop reusable frameworks, coding standards, and CI/CD pipelines for data engineering solutions.
- Collaborate with cross-functional teams to design, develop, and deliver scalable data solutions.
- Perform continuous performance tuning, optimization, and capacity planning for data platforms and pipelines.
- Integrate LLM / Generative AI solutions with enterprise data sources using secure pipelines.
- Evaluate emerging AI data tools and recommend improvements to modern data platforms.
- Build and maintain data pipelines for AI/ML use cases, including feature engineering and model-ready datasets.
- Support deployment of AI-powered analytics, recommendation systems, and predictive solutions.
- Provide training, documentation, and ongoing support to end-users and internal stakeholders.
| Desired Skills and Experience:- Bachelor’s degree in computer science, Information Technology, Engineering, or a related field.
- 3+ years of hands-on experience in Data Engineering and cloud-based data platforms.
- Strong experience with Microsoft Azure technologies such as Azure Data Factory, Azure SQL, CI/CD pipelines, cloud infrastructure, ARM Templates, and Terraform.
- Hands-on experience with Databricks for data engineering, ETL pipelines, and big data processing.
- Solid programming and strong logical problem-solving skills in Python.
- Strong expertise in SQL.
- Good understanding of data warehousing, data lake, and Lakehouse architectures.
- Hands-on experience with ETL/ELT tools, data integration, and pipeline orchestration.
- Strong knowledge of database design, data modelling, and performance tuning concepts.
- Understanding of data security, access control, encryption, and compliance best practices.
- Knowledge of data governance, metadata management, lineage, and data quality frameworks.
- Experience with version control tools such as Git and collaborative development workflows.
- Experience with monitoring, alerting, and logging tools for production systems.
- Familiarity with distributed data processing tools such as Apache Spark is preferred.
- Exposure to AI/ML data pipelines, feature engineering, and modern analytics solutions is an advantage.
- Familiarity with LLM / Generative AI concepts, vector databases, or RAG-based solutions is a plus.
|
|
About CG Infinity: Headquartered in Texas, CG Infinity is one of the fastest growing software service companies in the region with 300+ strong team members in Dallas, Houston, Albuquerque, Little Rock and New Delhi, India. The company offers solutions that are tailored to the needs of individual clients utilizing expertise in customer experience & CRM, application development & integration, production support & quality assurance, and data analytics & AI. CG Infinity’s mission is to grow talent and develop life-long relationships with its customers. The company has been featured on INC 5000 and The Best Places to Work in recent years. Website: http://www.cginfinity.com | https://www.linkedin.com/company/cginfinityinc/ Company size: 201-500 employees Headquarters: Dallas, Texas Founded: 1998 Specialties: Engineering, Software Development, Mobility, Integration, Connected Devices, Outsourcing, Salesforce, Cloud, Technology, Security, Industrial Internet of Things (IIoT), Retail, and Energy
 |