Skip to main content
Data Engineering Pipeline

News Analyzer

A robust ETL and summarization pipeline that transforms raw local news into concise AI-generated briefings. Built with Python, Playwright, and Kubernetes.

Loading visualization...

Live visualization of the data processing stages. Particles represent individual article batches flow.

Under the Hood

Automated Scraping

Uses Playwright to handle authentication and navigate dynamic e-edition pages. Handles PDF downloads and session management automatically.

Vector Storage

Extracted text is chunked and embedded, then stored in Qdrant and Weaviate for hybrid search capabilities.

AI Summarization

Leverages LLMs via LiteLLM to generate concise bullet-point summaries and extract key entities, ensuring relevance to local context.