CFOCoder
Notes on data engineering, devops, and building things that work.
-
From Native Installation to a More Stable Hadoop + Hive Stack with Coolify
In the previous posts of this series, I installed Hadoop 3.3.6 natively on Ubuntu, configured YARN, ran MapReduce jobs, installed Apache Hive 3.1.3 on top of Hadoop, loaded external tables from HDFS,...
Hadoop
-
Querying Apache Hive from DBeaver: Starting HiveServer2 and Connecting a Desktop SQL Client
In the previous posts of this series, I installed Hadoop 3.3.6 natively on Ubuntu, configured YARN, ran MapReduce jobs, installed Apache Hive 3.1.3 on top of Hadoop, and finally loaded CSV files into...
Hadoop
-
From HDFS to SQL Queries: Loading CSV Files into Hive External Tables and Querying with SQL
When I completed the installation of Hadoop 3.3.6 and Apache Hive 3.1.3 on my Ubuntu machine, I had everything running smoothly. But then came a practical question that every data engineer faces: How...
Hadoop
-
Restic + MinIO for OpenClaw: What It Is, What It Solves, and the Quick Reference I Wanted Yesterday
Yesterday I spent part of the day optimizing my OpenClaw setup and cleaning up the way I protect its operational state.
Linux
-
Building a Modern Frontier Data Stack: Hadoop 3.4.3, Hive 4.2.0, and MinIO S3 Integration in 2026
A few days ago, I published posts about how to install Hadoop 3.3.6 natively on Ubuntu. At that time, I thought it was the state of the art. But things in the Big Data world move fast.
Hadoop
-
Apache Hive 3.1.3 on Ubuntu: Native Installation on Top of Hadoop 3.3.6
In Part 1 of this series, I installed Hadoop 3.3.6 natively on Ubuntu 24.04 and configured HDFS in pseudo-distributed mode. In Part 2, I configured YARN and ran the canonical WordCount job on War and...
Hadoop
-
Correcting Word Frequencies with Data Normalization: MapReduce Text Processing on War and Peace — Part 3
In Part 1 of this series, we installed Hadoop 3.3.6 natively on Ubuntu and configured HDFS for distributed storage. In Part 2, we configured YARN, wrote our first MapReduce program (WordCount), and...
Hadoop
-
Running Your First MapReduce Job on Hadoop: WordCount on War and Peace
In Part 1 of this series we installed Hadoop 3.3.6 natively on Ubuntu 24.04 and got HDFS running in pseudo-distributed mode. That gave us a working distributed file system, but Hadoop is much more...
Hadoop
-
Hadoop 3.3.6 on Ubuntu: Native Installation without Virtual Machine
When I started working with Hadoop in a learning environment, the course guide indicated using Linux Mint in a virtual machine. However, I already had Ubuntu 24.04 installed natively on my Dell...
Hadoop
-
PersonaPlex: Mastering Conversational English with NVIDIA and RunPod
PersonaPlex: Mastering Conversational English with NVIDIA and RunPod
AI
-
Maximizing Value: How I Optimize GitHub Copilot Pro and Anthropic Subscriptions for Coding and Research
As a data scientist and developer, I rely on advanced LLMs (Large Language Models) like Claude Opus, Sonnet, GPT-4.1, and GPT-4o for both architectural planning and daily coding. But I quickly...
AI
-
Time Series Forecasting with Exogenous Variables: SARIMAX vs Prophet
Last quarter, I needed to forecast SMS volume for budget planning. The catch? Our SMS volume directly depends on how many locations we operate—and that number is also growing. I couldn’t just...
Data Science
-
From Zero to Hero: dbt with Ibis Framework
After completing the Complete dbt Data Build Tool Bootcamp, I realized dbt brings software engineering best practices to data transformation. But what if you could combine dbt’s structure with...
Data Science
-
Cypher Quick Reference Guide
A comprehensive quick reference for Cypher, the query language for graph databases. This guide focuses on Cypher syntax and patterns that work across Neo4j, Kuzu, and other graph databases supporting...
Data Science
-
Cloud-Powered Dictation: Fast STT for Old Linux Hardware with Copilot & Claude
I work daily with GitHub Copilot and Claude Code, and like many developers, I often think faster than I type. Speech‑to‑Text (STT) is an obvious productivity multiplier — but there’s a catch:
CloudLinux
-
Setting Up Passwordless SSH Across All Machines in Your Tailscale Network
If you have multiple machines connected through Tailscale, you’ve probably found yourself typing SSH passwords repeatedly when jumping between systems. In this guide, I’ll show you how to set up...
Cloud
-
How Cloudflare Tunnel Became a Game Changer for My Self-Hosted Setup
Prerequisites
Cloud
-
RunPod: The Cloud GPU Solution for Data Science Students
With over 500,000 developers using the platform, RunPod has become a popular choice for:
AI
-
Mastering Internal Networking in Coolify: Connecting n8n, Flowise, and OpenSearch Like a Pro
If you run a self-hosted stack on Coolify (v4), you likely have powerful AI tools like n8n, Flowise, LangFlow, and OpenSearch running side-by-side. But there is a catch: out of the box, Coolify...
AI
-
How to Install Langflow on Oracle ARM (Ubuntu) using Coolify: The “Proper” Way
Deploying AI tools like Langflow on self-hosted hardware (like the generous Oracle Cloud ARM Free Tier) gives you total control over your data and costs. While tools like Coolify make deployment...
AI
-
Self-Hosting a Vector Database: OpenSearch on Oracle ARM with Coolify
If you are building AI applications, RAG (Retrieval-Augmented Generation) pipelines, or just need a powerful search engine, you need a Vector Database. While services like Pinecone are great, they...
AI
-
Six Practical Ways to Install Apps on Ubuntu (Beginner-Friendly Guide)
Ubuntu offers several ways to install applications. Understanding each method helps you pick the right tool, avoid duplicates, and maintain a clean system. This guide covers six installation methods,...
Linux
-
Running Local Notebooks on Databricks Using VS Code
Execute Jupyter notebooks locally in VS Code while leveraging Databricks compute infrastructure. Your code runs on Databricks’ serverless compute, but you edit and manage files locally—giving you the...
Data Science
-
Markmap: Convert Markdown to Interactive Mindmaps
Markmap is a simple yet powerful tool that transforms markdown files into interactive, visual mindmaps. It eliminates the need for complex mindmapping software—just write markdown, and Markmap...
Data Science
-
Deploying MCPO with Dockerfile on Coolify
MCPO (Model Context Protocol OpenAPI Proxy) is a tool that exposes MCP server tools as OpenAPI endpoints, making them easy to integrate with platforms like Open WebUI. I previously wrote a detailed...
AI
-
Converting the Mexican Constitution PDF to Markdown with Docling
This tutorial demonstrates how to use Docling to convert PDF documents to Markdown, JSON, and other formats. We’ll use the Mexican Constitution (Constitución Política de los Estados Unidos Mexicanos...
AI
-
Docling Chunking Tutorial: Preparing Documents for RAG
1. BaseChunker
AI
-
Installing Coolify on an Oracle ARM Ubuntu server
Coolify is an open-source, self-hostable Platform-as-a-Service (PaaS)—think “Heroku/Vercel, but free on your own server”. Deploy applications, databases, and services with one click, automatic SSL,...
Linux
-
PM2: Complete Reference Guide
PM2 is a production-grade process manager that helps you keep applications running continuously. While it was originally designed for Node.js applications, PM2 can manage any type of application or...
Linux
-
Complete Guide: Installing Gurobi Optimizer on Oracle ARM Ubuntu
This guide will walk you through the complete process of installing Gurobi Optimizer with a Web License Service (WLS) license on an Oracle Cloud Infrastructure (OCI) server running Ubuntu on ARM64...
Optimization
-
LaTeX on Ubuntu ARM: Quick Setup & Reference Guide
LaTeX (pronounced “LAH-tech” or “LAY-tech”) is a high-quality typesetting system designed for the production of technical and scientific documents. Unlike traditional word processors like Microsoft...
Data Science
-
Exposing Local Projects to the Web with Cloudflared: A Quick Guide
Ever needed to quickly share a local project with someone without deploying it? Cloudflared makes it incredibly easy to expose your local development server to the web temporarily. Whether you’re...
Linux
-
🚀 Installing and Configuring MCPO for Open WebUI: A Complete Guide
Today I successfully set up MCPO (MCP-to-OpenAPI) to work seamlessly with Open WebUI, providing access to powerful external APIs through the Model Context Protocol. This post documents the entire...
AI
-
Installing AnythingLLM on Oracle ARM Ubuntu Server
sudo mkdir -p /var/www/html/anythingllm/storage
AI
-
Building Your First MCP Server: A Journey from API to AI Assistant
As someone who works at the intersection of data science and finance, I’m always looking for ways to make economic data more accessible. When I discovered the Model Context Protocol (MCP), I saw an...
AI
-
The Powerful COPY Command in DuckDB / MotherDuck: A Quick Reference Guide
The COPY command in DuckDB and MotherDuck is a versatile tool for importing and exporting data. This guide provides a concise overview of how to use COPY both from the DuckDB CLI (SQL only) and from...
Data SciencePythonSQL
-
Building a Complete DuckLake Solution: From Local Development to Cloud Production
DuckLake is revolutionizing the lakehouse architecture by combining the simplicity of DuckDB with the power of modern data lake formats. In this comprehensive guide, I’ll walk you through building a...
Data ScienceSQL
-
A SysAdmin’s Complete Guide: From Crisis to Clean Server – The Ultimate Disk Space Recovery Playbook
This is the comprehensive story of how I’ve evolved from reactive firefighting to proactive server management. What started as a “simple” file sync issue revealed a server at 93% disk capacity,...
Linux
-
How to Set Up a Powerful, Free Forever Server on Oracle Cloud (Caddy Edition)
Earlier in 2025, I signed up for Oracle’s “Free Forever” cloud offer. It was, and still is, one of the most generous free tiers available, especially for developers and hobbyists. I wrote a blog post...
Linux
-
The Ultimate Guide to Installing and Using Gemini CLI on Windows, Linux, and VS Code
In the fast-paced world of software development, efficiency is everything. What if you could bring the power of a cutting-edge AI directly into your terminal, ready to answer questions, explain code,...
AI
-
A Step-by-Step Guide to Installing Portainer with Docker and Caddy on an ARM Server
Managing a server with multiple Docker applications can quickly become a juggling act of docker ps, docker logs, and docker-compose commands. While powerful, the command line isn’t always the most...
Linux
-
Supercharging Your Cloud Server Management: Mountain Duck + Tailscale + ARM Ubuntu Done Right!
Managing files on a remote server can sometimes feel like navigating a maze. Public IPs, firewalls, SSH keys – it’s a lot to keep track of. But what if I told you there’s a “super cool” way to get...
Linux
-
Conquering “Permission Denied”: A Quick Guide to Ubuntu File Permissions
Ever been in the middle of setting up a new project, run a command, and then BAM! You’re hit with the dreaded Permission denied (os error 13)? It’s a common stumbling block, especially when dealing...
Linux
-
Setting Up SSH Keys for GitLab and GitHub: A Complete Guide
As a developer, I prefer to use GitLab for my private projects and repositories due to its generous free tier for private repos, while using GitHub for public open-source projects where the community...
Linux
-
Unleash the Power of Symbolic Math in Python: A Data Scientist’s Quick Guide to SymPy
As a Data Science Masters student, I’m constantly working with mathematical concepts. From the calculus behind gradient descent to the linear algebra that powers PCA, math is the bedrock of...
Data Science
-
Deploying Flowise on a Secured Subdomain with Caddy, Docker, and Oracle ARM VM
In the rapidly evolving landscape of AI, tools like Flowise are democratizing access to powerful Large Language Model (LLM) capabilities. Flowise offers a user-friendly, drag-and-drop interface to...
Cloud
-
Title: Self-Hosting Supabase on Oracle ARM (Ubuntu 24.04) with Caddy & Docker Compose: A Step-by-Step Guide
So, you’ve got a powerful Oracle Cloud ARM VM running Ubuntu 24.04, complete with SSL, Docker, and a Caddy reverse proxy neatly managing your subdomains. Your blog is humming along on the main...
Cloud
-
My Ultimate Snowflake Quick Reference Guide
In today’s data-driven world, the ability to effectively store, process, and analyze vast amounts of information isn’t just an advantage – it’s a necessity. For years, organizations grappled with...
Cloud
-
Installing Mautic 6 on Oracle ARM Ubuntu 24.04 with Caddy & Docker
Marketing automation is a powerful tool for businesses looking to nurture leads, engage customers, and streamline marketing efforts. Mautic stands out as a leading open-source marketing automation...
Linux
-
Installing Nextcloud on Oracle ARM Ubuntu 24.04 with Docker and Caddy
In today’s data-driven world, managing files, collaborating securely, and maintaining control over your digital assets is paramount. Nextcloud is a powerful, open-source, self-hosted productivity...
Cloud
-
Self-Hosting MinIO S3 Storage on a Mac Mini with Docker and Tailscale
Want your own private, S3-compatible object storage accessible securely from anywhere? Running MinIO on an always-on Mac Mini combined with Docker and Tailscale is a fantastic solution. This post...
Cloud
-
My Journey: Taming VPN Conflicts and Securing Server Access with Tailscale
Ever found yourself constantly connecting and disconnecting your regular VPN just to SSH into your own server? I certainly did. It was a daily annoyance: fire up the privacy VPN (like ProtonVPN in my...
Linux
-
Installing RedPanda Streaming Data Platform on Oracle ARM Ubuntu 24.04 with Docker and Caddy
In the world of data, moving information quickly and reliably is crucial. Enter RedPanda, a modern streaming data platform designed for simplicity and performance. Think of it as...
Linux
-
Installing Apache Airflow with Docker and Caddy on Oracle ARM (Ubuntu 24.04)
Running data pipelines often requires a robust orchestrator like Apache Airflow. Setting it up on modern ARM-based cloud infrastructure, such as Oracle Cloud’s Ampere A1 instances, can be efficient...
Linux
-
Installing Self-Hosted Airbyte on Oracle ARM Ubuntu 24.04 with Caddy & Cloudflare
Airbyte is a powerful open-source data integration platform, allowing you to sync data between various sources and destinations. Self-hosting Airbyte gives you full control over your data pipelines....
Linux
-
Self-Hosting ChromaDB on Oracle ARM Ubuntu with Docker, Caddy & Cloudflare
ChromaDB is a powerful open-source embedding database, essential for building AI applications involving semantic search, retrieval-augmented generation (RAG), and more. While Chroma offers a managed...
Linux
-
Self-Hosting Open Web UI on Oracle ARM with Docker and Caddy (Private Setup)
Running your own Large Language Model (LLM) interface offers fantastic benefits like privacy, customization, and potentially lower costs compared to hosted services. Open Web UI is a popular,...
Linux
-
The Self-Hosted Power Trio: Integrating Airflow, Airbyte, and n8n for Ultimate Data & Automation
In today’s data-driven world, moving information, processing it, and acting upon insights are critical. But stitching together these processes can be complex. Thankfully, a powerful trio of...
Cloud
-
Installing Matomo Analytics with Docker, Caddy, and Cloudflare on ARM Ubuntu
Matomo is a powerful, open-source alternative to Google Analytics that gives you full ownership of your website’s traffic data. This guide walks through installing Matomo on its own subdomain using...
Linux
-
Install and Configure Docker on Oracle ARM (Ubuntu 24.04) – Optimize Storage!
Oracle Cloud Infrastructure (OCI) offers powerful and cost-effective ARM Ampere A1 instances. Running Ubuntu 24.04 LTS (“Noble Numbat”) on these instances is a popular choice. If you’re planning to...
Linux
-
Install WordPress with Caddy & Auto-SSL on Oracle ARM Ubuntu 24.04 (Multi-Site Ready!)
Setting up a WordPress blog or website on an efficient Oracle Cloud ARM instance is a great way to get performance on a budget. Combining it with the modern Caddy web server provides automatic HTTPS,...
Linux
-
Installing Qdrant Vector Database on Oracle ARM Ubuntu with Docker, Caddy & Cloudflare
This guide details how to install the Qdrant vector database on an Oracle Cloud Infrastructure (OCI) ARM Ampere VM running Ubuntu 24.04. We’ll use Docker Compose for easy management, Caddy as a...
Linux
-
Avoiding the Pitfalls: A Smoother Guide to Restoring OCI Instances
Restoring a virtual machine instance from a custom image or backup in Oracle Cloud Infrastructure (OCI) should be straightforward. You click a few buttons, and presto, your server is back, right?...
Linux
-
Hosting WordPress on Oracle ARM Ubuntu 24.04: Nginx, Certbot & Root Domain Setup
So you’ve snagged one of Oracle Cloud’s awesome “Always Free” Ampere A1 (ARM) Compute instances running Ubuntu 24.04 LTS. Fantastic! Now, you want to host a WordPress blog on your main domain...
Blog
-
Running Your Own AI: Installing Ollama on Ubuntu 24.04 ARM with Docker and Caddy
Self-hosting Large Language Models (LLMs) is becoming increasingly accessible, offering benefits like privacy, cost savings, and customization. Ollama makes it incredibly simple to run open-source...
Linux
-
Installing Odoo 18 with Docker Compose and Caddy on Ubuntu 24.04 (Oracle ARM) – The Easy Way (After Hitting a Snag!)
Odoo is a fantastic suite of open-source business apps, covering CRM, ERP, accounting, inventory, and more. Running it yourself gives you full control, and Docker makes the deployment process much...
Cloud
-
Guide: Self-Hosting QuickChart on ARM with Docker (Local Build) & Caddy
This guide provides step-by-step instructions to install a self-hosted QuickChart instance on an ARM-based server (like Oracle Cloud Ampere A1) running Ubuntu 24.04, using Docker Compose and Caddy.
Data Science
-
Adding and Mounting a Block Volume in Ubuntu (with systemd)
This guide explains how to add a new block storage volume (like an additional hard drive or a cloud-provided block storage device) to an Ubuntu system, format it, and mount it permanently. We’ll...
Linux
-
Installing n8n with Docker Compose and Caddy on Ubuntu
Workflow automation tools are incredibly powerful, and n8n is a fantastic open-source, self-hostable option. It allows you to connect various apps and services to automate tasks visually.
Linux
-
Installing OpenBB Platform API on Oracle ARM (Ubuntu 24.04) with Docker & Caddy
OpenBB has evolved. While known for its powerful terminal and, previously, a self-hostable web application, the core of the self-hosted OpenBB Platform is now its robust set of API endpoints. This...
Data Science
-
Free Forever Oracle Virtual Private Server
I recently discovered a super cool offer from Oracle, to setup a free-forever Virtual Machine instance with an ARM processor, 4 OCPUs, 24 GB of RAM memory and 200 GB of storage. So this guide is to...
Linux
-
Time Series Forecasting
Time Series is a topic that I’m very passionate about, because for many years I have worked in the preparation of Forecasts and Budgets for Revenue, Expenses and Headcount in Excel, but so far, I...
Data Science
-
Unraveling the Differences: Data Science, AI, and Data Engineering
I recently met with a group of colleagues from the accounting firm (KPMG), where I worked many years ago, and I found it hard to explain to them, what I do in the field of Data Science and what...
Data Science
-
Git Quick Reference Guide
This is a quick reference guide about Git so I can have it at hand. I took the data from the official documentation located at: https://git-scm.com/docs
Data Science
-
Azure Databricks Quick Reference Guide
Databricks is an analytics and data engineering platform that sits on top of Spark, an analytics engine for big data processing and machine learning. Spark uses in-memory processing using a...
Data Science
-
Python Practical Reference Guide
This is a Python reference guide that I wrote for myself, with code samples so I can remember how to write them. I embedded a Jupyter notebook inside this blog post to test them.
Python
-
KQL and Azure Data Explorer – Reference Guide
Recently I’ve been learning a new database query language, KQL which stands for “Kusto Query Language”. It is the language used by Azure Data Explorer, a tool in Microsoft’s Azure Cloud that helps...
SQL
-
Advanced SQL Server Guide
This is a short quick guide on advanced SQL Server topics, that I recently learned in this course from Udemy. The examples run in this post, use the AdventureWorks2019 sample database provided by...
SQL
-
SQL Server Integration Services (SSIS) Guide
Here is a Guide on SSIS
SQL
-
NetStat Cheat Sheet
Here is a quick cheat sheet about NetStat that I gathered from the video I share at the bottom of this post. NetStat is a very useful tool for managing system network connection ports among other...
Linux
-
How to Setup an Azure Virtual Machine for Python Development
As part of the Data Science course I’m currently studying in Tec de Monterrey, I learned how to do remote python development using Github Codespaces for free. I found this to be an eye-opener as it...
Cloud
-
Cloud Services Comparison (AWS vs Google vs Azure)
This is a quick reference guide to compare the basic cloud services among the 3 top competitors, Microsoft Azure, Google Cloud Services, and Amazon AWS. This compilation was made based on two...
Cloud
-
How to Connect to an Amazon EC2 Virtual Machine from Windows 10
Connecting to Amazon’s EC2 Virtual Machine from Linux is easy, as all we have to do is download the PEM key file into the project folder, change the permissions with the CHMOD command as in the...
Cloud
-
How to Install Linux Ubuntu on Windows
I recently learned how to install Ubuntu on my Windows 10 laptop, not as a virtual machine, but as part of my normal Windows installation, so I have the advantage of not wasting resources dedicated...
Linux
-
Docker and Kubernetes Guide
Additional parameters to run a container
Cloud
-
Guide to YAML
YAML is a data serialization language, which is very easy for a human to understand and is widely used in the configuration files of different applications.
Cloud