Data Engineering Pipeline
News Analyzer
A robust ETL and summarization pipeline that transforms raw local news into concise AI-generated briefings. Built with Python, Playwright, and Kubernetes.
Loading visualization...
Live visualization of the data processing stages. Particles represent individual article batches flow.
Under the Hood
Automated Scraping
Uses Playwright to handle authentication and navigate dynamic e-edition pages. Handles PDF downloads and session management automatically.
Vector Storage
Extracted text is chunked and embedded, then stored in Qdrant and Weaviate for hybrid search capabilities.
AI Summarization
Leverages LLMs via LiteLLM to generate concise bullet-point summaries and extract key entities, ensuring relevance to local context.