Jiajun's AI Notebook

Dissertation Content: What does the fixing of broken bridge have to do with Chinese Civil Service Examination system?

Jiajun Zou — Sat, 22 Mar 2025 06:46:47 GMT

One of the most enduring historical puzzles in the history of the Chinese civil service examination system can be illuminated by my geographic analysis in Chapter 3, using the case of Fujian as a key example.

The Wanan Bridge—also known as the Luoyang Bridge, shown below—was a vital infrastructural link connecting the southern and eastern regions of Fujian province. Eastern Fujian, home to the provincial capital Fuzhou, was the mandated site for provincial-level examinations over the past five centuries. By contrast, Southern Fujian—comprising today’s Quanzhou and Zhangzhou, also known as Minnan and the ancestral homeland of many Taiwanese—was notably underrepresented in the early Ming examination records, particularly before 1430.

What remains perplexing to historians is the sudden and sustained rise in examination success from Southern Fujian beginning in the mid-15th century, especially after 1500. Existing explanations tend to invoke vague notions: improved teaching quality, better schools, or general economic growth. Yet these arguments are based primarily on correlation rather than demonstrable causation, and none sufficiently explain the abrupt regional transformation.

The more plausible and empirically grounded explanation may lie in the repair and reopening of the Wanan Bridge. Once this critical piece of infrastructure was restored, access to the examination center in Fuzhou dramatically improved for residents of Southern Fujian. This facilitated a broader and more equitable participation in the provincial exams, which likely accounts for the region’s sudden academic ascendancy.

This temporal alignment between infrastructural development and examination success is unlikely to be coincidental—and Fujian is not the only province where such patterns emerge. This case underscores the power of a geographic framework to resolve long-standing historiographical ambiguities. It suggests that the rise of Southern Fujian was less about an intrinsic surge in intelligence or scholastic culture, and more about improved access: far more people were simply able to participate than in earlier decades.

AI Tips: Using Qwen-VL-Max for Chinese OCR of Classical Books

Jiajun Zou — Sat, 22 Mar 2025 06:28:47 GMT

Most AI Models Struggle with OCR for Classical Chinese Texts—But Qwen VL Max Shows Promise

Optical Character Recognition (OCR) remains one of the most difficult tasks for AI when it comes to classical Chinese texts—especially those written in vertical, top-to-bottom and right-to-left formats. This difficulty is further compounded when the source material is handwritten or uses classical fonts.

In China, several specialized OCR companies have emerged that focus specifically on classical Chinese prints. These firms often rely on proprietary models that deliver high accuracy, but their services can be quite costly. Based on my own experience, I’ve spent between $200 and $300 using one such provider.

Is there an alternative? Fortunately, the answer is yes—and potentially a game-changer. With recent advances in AI, models like Qwen VL Max are beginning to perform impressively in this niche domain. You can explore its capabilities on the official testing page: https://huggingface.co/spaces/Qwen/Qwen-VL-Max

To illustrate, I tested the model on an image from the Chinese imperial civil service examination. While the model initially failed, it achieved near-perfect accuracy once prompted with clear instructions—specifically, to interpret the text vertically and read from right to left. This kind of instruction-based adaptability is crucial when working with complex historical texts.

I'm currently working on an automated solution to integrate Qwen’s OCR functionality via API, and I plan to publish the code on GitHub. However, it's worth noting that the Qwen API is currently not very accessible to users based in the United States, which could be a limitation for Western researchers.

Nevertheless, the potential here is enormous. A capable and low-cost alternative to traditional OCR services could significantly reduce the barrier to entry for digital humanities projects focused on Chinese history and literature.

AI Tip: Is MiniMax the Next DeepSeek?

Jiajun Zou — Sat, 22 Mar 2025 05:20:52 GMT

One of the most impressive AI developments to come out of China in recent memory was the release of DeepSeek’s Reasoning Model, which rattled the U.S. AI sector and even contributed to one of the most significant drawdowns in Nvidia’s stock price. DeepSeek’s success illustrates that China is not only a fast follower but also an emerging innovator in the foundational model space.

But while DeepSeek captured headlines, there may be another contender—MiniMax—poised to disrupt the AI landscape even further. Despite operating largely under the radar, MiniMax has made a bold claim: their model supports a 4 million token context window. If true, this technological achievement could be far more consequential than many currently appreciate.

🌐 Website: https://www.minimax.io

Why Context Window Matters

To appreciate the implications of a 4 million token context window, consider what that enables. A typical 400-page book is approximately 800,000 to 1 million tokens. MiniMax’s model, in theory, can process entire books, massive HTML structures, extensive code repositories, or large datasets in a single pass—without needing to chunk or truncate content.

This stands in stark contrast to Retrieval-Augmented Generation (RAG), which works by retrieving and injecting only a few relevant passages based on semantic similarity. While RAG is a powerful workaround, it has a key limitation: it can miss critical but non-obvious context. If a section of the source material is relevant but not semantically similar, RAG may never surface it.

By comparison, long-context models like MiniMax can ingest everything up front, ensuring contextually holistic understanding—especially useful when:

Auditing the structure of a complex website
Analyzing long legal documents or academic papers
Reading and summarizing entire books
Reviewing multi-thousand-line codebases on GitHub

With short context models, attention degrades over long sequences, and the model’s coherence drops. MiniMax avoids this pitfall by maintaining full attention across far larger token windows.

Early Impressions of MiniMax

MiniMax is not without trade-offs. Based on hands-on testing, its coding capabilities appear roughly on par with GPT-3.5, which is competent but not best-in-class. Moreover, its responses tend to be concise, even when fed large volumes of data. However, these shorter responses should not be confused with poor comprehension—the model clearly digests the content and provides coherent answers.

What MiniMax lacks in generative verbosity or top-tier reasoning (for now), it compensates for with unprecedented input capacity. In practical use cases—such as technical audits, deep document reviews, or learning how an entire system works—the ability to feed an entire context without summarization or segmentation can make a world of difference.

Looking Ahead: Will U.S. Companies Catch Up?

MiniMax’s claim of a 4 million token window, if independently verified and scalable, represents a substantial leap forward. While OpenAI, Anthropic, Google, and Mistral have all pushed the context frontier recently, none have publicly achieved this scale.

The implications are profound. With long-context models, individual learners can study entire textbooks, reverse engineer codebases, or perform high-level analyses—at speed and depth never before possible. We could be on the verge of an explosion in autodidactic talent—individuals who learn by consuming vast technical materials directly through machines.

Whether American firms can match or exceed this development remains to be seen. But the direction is clear: context size is no longer just a technical specification—it is a defining competitive frontier in AI.

Dissertation Content: What Does China’s Diverse Topography Imply About Geographic Thesis for Civil Service Examination in Ming China?

Jiajun Zou — Sat, 22 Mar 2025 05:02:50 GMT

Topography and Travel: Rethinking Transportation in Premodern China

For historians and enthusiasts of China’s geography and cultural history, traditional administrative maps can be deceptively simple. These maps often give the impression that travel between regions in premodern China was primarily a matter of distance—one city to the next, measured in straight lines. However, when we overlay elevation data onto these maps, a far more complex and revealing picture emerges. The stark contrast between northern and southern China’s topography challenges that simplistic view and opens up deeper insights into how geography shaped historical development.

This elevation-based perspective offers explanatory power for several of China’s long-debated historical questions. One such case is the imperial examination system during the 15th century, a key institution in Chinese governance and social mobility. My dissertation argues that the success or failure of candidates across regions had less to do with innate ability or cultural investment in education, and far more to do with accessibility—specifically, the ease or difficulty of traveling to the examination capitals.

Regions with high success rates in the examinations often coincided with areas that had relatively flat terrain, navigable waterways, or direct routes to the capital cities where examinations were held. In contrast, those areas that produced very few successful candidates were typically isolated by natural barriers—rugged mountains, dense forests, or winding rivers that made multi-day travel arduous and expensive. These topographical constraints served as de facto filters, limiting participation not by merit, but by geography.

Take Fujian Province as a case study. Fujian is one of the most topographically challenging provinces in China, marked by steep mountains and a deeply indented coastline. Unsurprisingly, it also exhibits some of the most severe regional disparities in examination participation during the Ming dynasty. The vast majority of successful candidates came from coastal prefectures such as Fuzhou and Xinghua, both of which had relatively direct routes to the capital. In contrast, inland and southern areas of Fujian—though home to equally capable individuals—saw much lower levels of participation and success. The reason? Their mountainous terrain made travel to the examination centers logistically and physically prohibitive.

Below is a visualization of Fujian’s elevation profile. Due to image size constraints, only a few representative samples are shown, but the contrast is clear: coastal regions are low-lying and accessible, while the inland and western areas are mountainous and isolated.

1. Similarly, Zhejiang Province demonstrates a comparable pattern of geographic disparity in civil service examination outcomes. The southern and western regions of the province—characterized by rugged hills and complex river systems—produced negligible numbers of successful examination candidates. In contrast, the northeastern lowland areas, which are flatter, more accessible, and economically vibrant, saw a disproportionately high concentration of examination success.
  
  This spatial inequality cannot be explained merely by population size or educational investment alone. While the northeastern region is indeed the most populous, what stands out is its geographic proximity to Renhe County—the provincial capital of Zhejiang during the Ming dynasty (modern-day Hangzhou). The correlation between low elevation, transport accessibility, and administrative centrality suggests that ease of travel, not just talent or ambition, played a defining role in determining who could realistically compete in the examination system.
  
  This supports the broader thesis that the civil service examination system, though formally meritocratic, was in practice filtered through the lens of geography—privileging those regions with infrastructural and topographical advantages while marginalizing those hindered by natural barriers.

Rethinking Northern China's Examination Legacy: The Case of Shandong

Why, then, has northern China historically underperformed in the civil service examinations over the past five hundred years? A common—but deeply flawed—narrative attributes this to cultural, intellectual, or even economic inferiority compared to the south. These assumptions, unfortunately, are found not only in historical sources but are still echoed—often implicitly—by some modern historians.

However, such explanations collapse under closer geographic scrutiny.

Take Shandong Province as a representative case. Shandong, like much of northern China, is characterized by flat terrain and relatively easy transportation networks. Aside from some central highland regions—which predictably underperform in exam success due to their elevation—most of the province is open and well-connected, both in terms of physical geography and infrastructure.

This topography has significant implications: Shandong's examination winners are remarkably evenly distributed across the province. Unlike southern provinces where examination success is concentrated in specific lowland corridors, Shandong exhibits no such spatial clustering. This evenness reflects a basic geographic truth—when travel is easy, participation is democratized. Regardless of whether a candidate lived in the east, west, or coastal edge of the province, they had comparable access to the examination centers.

The implication is profound: it is geography—not culture, nor learning, nor some intrinsic intellectual quality—that shapes regional disparities in civil service examination success. This finding is supported by data and statistical modeling, yet remains underappreciated in the field of historical studies, which still leans heavily on humanistic narratives and source-based interpretations. These traditional approaches, while valuable, often carry embedded cultural biases that obscure structural explanations like geography.

Reframing the imperial examination system through the lens of topographical access helps us see regional inequality not as a reflection of human differences, but as a consequence of infrastructural and environmental constraints. It also challenges us to reconsider how we interpret historical outcomes: what appears to be meritocratic may, in fact, be geographically deterministic.

AI Tip: How to do OCR using Gemni Vision Model

Jiajun Zou — Sat, 22 Mar 2025 04:43:02 GMT

If you’re an academic who regularly works with printed sources, you know how crucial OCR (Optical Character Recognition) is. Whether you’re trying to make a scanned document searchable or feeding it into a more advanced RAG (Retrieval-Augmented Generation) system, the quality of the OCR makes or breaks the workflow.

Unfortunately, traditional tools like Adobe Acrobat or ABBYY FineReader often fall short—especially when high precision is needed. The good news? There are now two much more effective approaches that leave those legacy options behind.

In this post, we’ll explore the first of those: using Gemini’s vision model to achieve high-accuracy OCR that’s ready for modern research workflows.

To see the code depository, visit my github at https://github.com/jzou19957/Automatic_OCR_Through_Gemni_Vision

Below is an instruction about how to use it:

📘 How to Use: Automatic_OCR_Through_Gemini_Vision

If you're a non-programmer looking for a convenient way to perform high-quality OCR on PDFs using Gemini’s Vision model, this tool is designed for you. Here's how to get started:

Step 1: Get a Gemini API Key

Visit the official Gemini API page:
👉 https://aistudio.google.com/app/apikey
Request an API key by following the instructions on the page.

⚠️ Important Note on Billing:
You can use the API for free without linking it to a billing account. However, in that case, requests will be much slower due to rate limits.
If you link the API key to a billing account, you'll get faster performance, but usage will incur automatic charges.

To use the tool, simply replace the placeholder API key in the script with your own key, which you can obtain from:

👉 https://aistudio.google.com/app/apikey

⚠️ Tip: If you're not linking to a billing account, expect slower performance due to free-tier rate limits. For faster results, linking a billing account will automatically enable priority access—but this will incur usage-based charges.

To get started, place the PDF files you want to convert in the same folder as the Python script. For example, if you’re working with a book called example_book.pdf, your project directory should look something like this:
```
 bashCopyEdit/your-project-folder
 ├── example_book.pdf
 ├── Automatic_OCR_Through_Gemini_Vision.py
```
Once everything is in place, here’s what the code does under the hood:
1. PDF to Image Conversion:
  The script uses the Pillow (PIL) library to split your PDF into individual page images. Each page is converted into a high-resolution .png or .jpg image—this prepares it for accurate OCR processing.
2. OCR via Gemini Vision API:
  Each image is then automatically sent to Google’s Gemini Vision model via API. The script processes the pages sequentially, ensuring that each page receives dedicated attention for optimal OCR accuracy.
3. Markdown Output:
  The text content extracted from each image is saved as individual Markdown (.md) files—one per page. A combined Markdown file is also generated for easier full-text querying.

▶️ Running the Code in Visual Studio Code

To run the tool, simply open the Python script in Visual Studio Code and execute it. The script is designed to be user-friendly and self-contained:
- It automatically installs all required dependencies on first run (no manual setup needed).
- It converts the input PDF into OCR-ready content using high-accuracy image-to-text processing via the Gemini Vision API.
- Each page of the PDF is:
  - Converted to a high-resolution image,
  - Passed through the Gemini OCR model,
  - And saved as a Markdown (.md) file.

The output includes:

One Markdown file per page, and
A combined Markdown file containing the full content for convenience.

This process effectively creates a digitized, query-ready version of the book that’s ideal for:

Full-text search,
Personal knowledge management systems,
Academic research, and
RAG (Retrieval-Augmented Generation) pipelines.

AI Tip: How to Get Claude to Continuously Write Long Code Using a System Prompt

Jiajun Zou — Sat, 22 Mar 2025 04:07:31 GMT

When working with Claude Sonnet 3.5 for code generation, a common challenge arises: the model frequently exceeds the context window when producing long code. Once this happens, asking Claude to “fix” or “continue” the code often leads to an unexpected outcome — it starts rewriting the entire codebase from scratch.

This behavior becomes especially frustrating when you're working with a large project, say 1000+ lines of code. Each time you try to continue, Claude may interpret your instruction as a cue to start over, causing you to get stuck in a never-ending loop of incomplete rewrites.

Why This Happens

This issue stems from the autocomplete-driven nature of language models. AI models like Claude operate by predicting the next most likely token in a sequence. When your prompt is too vague or lacks continuity cues, the model may default to restarting the task because that’s what it probabilistically deems appropriate in such contexts.

In essence, unless you’re explicitly clear in your instructions about what action the AI should take, it will often revert to the beginning — a behavior that makes large code completions impractical.

The Fix: Use a Persistent System Prompt

To overcome this, I created a system prompt designed to guide Claude’s behavior consistently throughout a session. This prompt can be pasted into a Claude project as a persistent instruction. Once in place, Claude will remember to continue editing the same code artifact instead of starting from scratch.

The result? You can now push through lengthy codebases — even those exceeding 1000 lines — with confidence that the model will build on the existing work rather than wipe it clean.

Why This Works

The key insight is that AI models need explicit behavioral scaffolding. If you tell them, “Continue editing the existing code rather than restarting,” you influence the probability distribution of next-token predictions, steering the model toward iterative refinement rather than reset.

This subtle prompt engineering hack essentially exploits the model’s natural tendencies — helping you finish long code projects without interruption.

    
    Ensure all code requests are delivered in one single artifact, without abbreviation, omission, or placeholders.
    
        Always provide the full, complete, executable and unabridged implementation in one artifact.
        Include every function, every class, and every required component in full.
        Provide the entire codebase in a single artifact. Do not split it across multiple responses.
        Write the full implementation without omitting any sections.
        Use a modular and structured format, but include all code in one place.
        Ensure that the provided code is immediately executable without requiring additional completion.
        All placeholders, comments, and instructions must be replaced with actual, working code.
        If a project requires multiple files, simulate a single-file representation with inline comments explaining separation.
        Continue the code exactly from where it left off in the same artifact.
    

    
        ‘...rest of the code remains the same.’
        Summarizing or omitting any function, event handler, or logic.
        Generating partial code requiring user expansion.
        Assuming the user will "fill in the gaps"—every detail must be included.
        Splitting the code across responses.
    

    
        The generated code must be complete, standalone, and executable as-is.
        The user should be able to run it immediately without modifications.