Articles
Latest content updates, feel free to browse.
Code Can Be Imperfect, Architecture Cannot Be Chaotic: AI Coding Practices in a Cross-Platform Team
Over the past six months, I've been deeply using AI programming in an Android + iOS cross-platform project. Logic is highly aligned across both platforms, with累计 code volume in the hundreds of thousands of lines. Honestly, I stumbled through plenty of pitfalls along the way, but also摸索 out some fairly reliable methodologies. This article isn't a tutorial on how to use AI tools—there's already enough of that content online. What I want to discuss is the shift in mindset: when AI becomes the primary code producer, what adjustments should engineering practices make?
Deep Research Agent in Practice (Part 3): Building Multi-Agent Systems
Deep Research Agent in Practice (Part 3): Building Multi-Agent Systems Come to think of it, this series has reached its final installment. In the first two articles, we discussed the basic architecture and practical applications of Deep Research Agents. Some friends left comments asking: "Can you talk about more complex scenarios?"
The 30-Minute Boundary of AI Workflows
The 30-Minute Boundary of AI Workflows Today I spent the entire day messing around with AI workflows—BMAD, ralph-loop, planning-files, helloAgent, taking turns one after another. The result? Still stuck. Just as I was questioning my life choices, I came across an article from Metr.org that woke me right up.
Deep Research Agent in Practice (Part 2): Building a Basic Research Agent
Hey friends, welcome back to our "Deep Research Agent" practice series! In the previous article, we explored the core concepts behind Research Agents. Today, we're getting serious—rolling up our sleeves and building our own basic research agent step by step! Iterating on a product is always exciting, isn't it? This time, we'll not only dive deep into the internal structure of two core workflows, but also package the final result into a complete service with both frontend and backend.
Deep Research Agent in Practice (Part 1): Guide to Architecture Design and Evaluation System Building
Hey everyone! Have you been bombarded with all kinds of powerful Agent applications lately? Have you ever wondered how a 'Deep Research' Agent is actually built? Today, let's deconstruct the high-quality course "Deep Research from Scratch" officially released by LangGraph. Forget about code for now — let's understand the architecture design and evaluation system of a top-tier Research Agent! This course is extremely high quality. Relying on just one external tool — Tavily (a search tool) — it implements a powerful deep research Agent. Even better, what it teaches us is not just code, but the prompt design principles and Agent evaluation system behind it. The content is incredibly valuable.
GitHub Official Release! spec-kit: The Swiss Army Knife for Project Development
AI coding is evolving rapidly, with new AI coding tools emerging every week that amaze me. Remember last time when I shared the AI programming workflow with everyone? Recently, I found an even better tool that can better generate code implementations for startup projects — that's spec-kit, open-sourced by GitHub in September. Today, let's dive into just how amazing it really is. What is Spec-Driven Development?
AI as Your Assistant: Delivering an MVP in 3 Days! A Practical Review of Controllable AI Programming
Friends! Want to try being the boss of AI, letting it serve as your "Product Director" and "Chief Engineer," delivering an MVP product from scratch in just 3 days? Today, let's dive deep into a new topic: Controllable AI Software Engineering. In the past year, AI programming tools have emerged one after another, from Cursor, Qoder, Trea to the command-line-based Claude Code — it's simply dazzling. As an experienced programmer, the explosion in productivity driven by AI's increasing maturity excites me, but the anxiety that my self-growth can't keep pace with AI's development also troubles me.
How Human-in-the-Loop Can Save Agents from Rogue Actions?
Today, let's continue our discussion on LangGraph practices. This time, we're talking about human-in-the-loop. After all, no matter how intelligent an agent is, it's still prone to pulling "rogue moves" — like fabricating data or making decisions beyond its authority. human-in-the-loop is essentially installing a "brake" on the agent, allowing humans to step in and make decisions at critical moments. In this article, we'll start from the basic concepts, clarify how it differs from multi-turn conversations, then implement this functionality using LangGraph, and see how it's applied in practice in our TinyCodeBase project. What is human-in-the-loop?
How Much I Used to Dislike LLM Frameworks, Now LangGraph is Just So Good
When talking about LLM frameworks, were you like me—thinking that things like LangChain were a bit "overkill"? Always felt like writing a few lines of Python code could handle everything. That's what I used to think, until I encountered LangGraph... Wow, it's amazing! The smoothness with which it solves tool calling problems made even a "hand-rolling purist" like me exclaim "bravo!" Today, let me take you through this experience, and share the pitfalls I encountered along the way. The official tutorial is divided into 6 steps. Today we'll tackle the first three steps. Mastering these three will solve 80% of Agent development problems.
The Clever Use of Gemini CLI Custom Commands
Introduction Gemini CLI is a command-line tool launched by Google that allows you to interact with Gemini models through the command line. Nowadays, I use Gemini CLI for many programming scenarios. When using Gemini CLI, I always feel a bit constrained—each project can only configure one gemini.md. Switching scenarios requires constant modifications, which is too troublesome!
Exploring LLM Agent Evaluation: Understanding the Underlying Logic
Why is LLM Evaluation So Difficult? In the AI community, LLM evaluation is堪称 a "metaphysical scene": a certain domestic model tops GPT-4 on the C-Eval leaderboard, yet frequently outputs outdated policies in real financial consulting; GPT-4 can pass the bar exam, yet sees its accuracy plummet by 10% on elementary math problems with slightly tweaked numbers. As models evolve from "specialized experts" to "versatile Agents", traditional evaluation systems are experiencing unprecedented disruption. In traditional evaluation systems, the tasks we perform are standardized, such as image classification, text extraction, sentiment analysis, etc. These tasks have fixed frameworks and standard output results. We can evaluate model performance through metrics like accuracy, recall, F1 score, etc.
Building Your Own Agent from Scratch
Today, we will systematically build a customized Agent from scratch. Following our previous exploration of Agent principles, this section will be based on the ReAct paradigm (for background, please refer to previous articles) and gradually implement a complete workflow for a lightweight LLM Agent. Step 1: Building the Tool Library In the ReAct paradigm, Agents rely on external tools to execute tasks. The following example is from the implementation in tools.py, which includes a Google search tool function.
An In-Depth Look at AI Agents for Large Language Models
I'm preparing to continue learning the TinyAgent project and integrate it with my previous project TinyCodeRAG. Moving forward, I plan to focus on building a code knowledge base project suitable for simple personal deployment, so I've renamed the repository to TinyCodeBase(https://github.com/codemilestones/TinyCodeBase). Welcome to give it a star and stay tuned. Before starting to build Agent capabilities, let's first discuss what an Agent is.
Building TinyCodeRAG Step-by-Step: A Lightweight Code Knowledge Base Solution
In the previous article, we broke down the core components of RAG systems. Today, let's do something even cooler—personally build a TinyCodeRAG optimized specifically for code! 💡 Quick primer: RAG (Retrieval-Augmented Generation) technology effectively alleviates the "hallucination" problem of large models by combining external knowledge bases with AI generation capabilities. You might wonder: "With TinyRAG already excellent, why reinvent the wheel?" Two reasons:
Breaking Down RAG Systems
Hey there! Everyone's actively diving into the LLM transformation wave these days! From company leaders wanting to boost efficiency, to teachers using it for research, to everyday folks seizing opportunities for financial freedom—I'm one of them, transitioning from traditional development to LLMs. After all, traditional software is officially micro, while LLM development is on the rise! So, I want to share the insights and experiences from my learning process, hoping to help you all out. Today's first post: let's talk about breaking down RAG systems. What Do Typical RAG System Products Look Like?
Four Years of Part-Time Master's Study: The Journey Culminates!
After four busy yet fulfilling years, I have finally completed my part-time master's degree! This journey has been filled with challenges, requiring me to constantly find a balance between work and coursework. Though demanding, my passion for exploring knowledge never wavered, and every step of learning and growth felt deeply rewarding. My Graduate Study Footprints Starting Line (2021): I took the 2021 graduate entrance exam and officially began my master's journey in September of that same year, studying under the guidance of my mentor.