Open Education

OER Resource Manager

We built a batch processing pipeline that pulls Open Educational Resources from over 50 repositories and ranks them by quality. Faculty can find free textbook replacements without spending a weekend searching.

0+ Sources Indexed

AI Quality Scoring

0% Cost Reduction

Want Something Like This? Read the Full Story

Screenshot coming soon

The Challenge

What was broken.

Faculty across the institution wanted to adopt Open Educational Resources to cut student costs, but the ecosystem was a mess. Dozens of OER repositories existed (OpenStax, OER Commons, MERLOT, MIT OpenCourseWare, and others), each with different interfaces, metadata schemas, and quality levels. Finding a replacement for a $200 textbook meant manually searching each platform one at a time.

Even when faculty found OER materials, there was no good way to judge quality. A resource might look promising from its title but turn out to be a poorly formatted PDF with outdated content. Without any shared evaluation system, every faculty member started from scratch, repeating the same frustrating search their colleagues had already given up on.

The institution needed a way to pull resources from every major OER repository, normalize the metadata into one searchable index, and surface the best materials automatically. Faculty should be teaching, not searching.

Fragmented Repositories

Dozens of OER sources with different interfaces and metadata formats. No single place to search them all.

No Quality Signal

Faculty couldn't distinguish high-quality OER from outdated or poorly structured materials without downloading and reviewing each one.

Textbook Cost Burden

Students were spending hundreds per semester on required textbooks. Free alternatives existed but were nearly impossible to find at scale.

Manual Search Fatigue

Each faculty member spent hours repeating the same searches across the same repositories. None of that work could be reused.

The Approach

How we solved it.

Multi-Source Harvesting Pipeline

We built a Python batch processing system with modular connectors for 50+ OER repositories. Each connector handles a source's unique API or scraping requirements and pulls resource metadata, download links, and PDFs into a unified staging area.

Connectors for OpenStax, OER Commons, MERLOT, MIT OCW, Open Textbook Library, BC Campus, and dozens more. Each has its own authentication, pagination, and rate-limiting logic.

Schema Mapping & Metadata Normalization

Every source uses different metadata formats. Some follow Dublin Core, others use custom schemas, and many have inconsistent or missing fields. We built a schema mapping layer that normalizes everything into a consistent structure: subject, format, license, author, publication date, and content type.

AI-Powered Quality Scoring

I integrated OpenAI to analyze downloaded PDFs and resource descriptions. The system scores each resource on content depth, organization, recency, citation density, and pedagogical value. Faculty can trust the top-ranked results without reviewing every PDF themselves.

PHP Web Interface & CLI Tools

We built a PHP web interface backed by MySQL where faculty can browse, search, and filter indexed resources by subject, format, quality score, and license type. A companion CLI tool handles batch harvesting, re-indexing, and re-scoring on a schedule.

Technologies Used

Python OpenAI API PHP MySQL PDF Processing Schema Mapping Batch Processing Web Interface

Key Features

What it actually does.

Multi-Source Harvesting

Automated connectors pull resources from 50+ OER repositories (OpenStax, MERLOT, OER Commons, MIT OCW, and others) into a single searchable index.

PDF-First Content Ranking

Downloads and analyzes the actual PDF content, not just metadata, to rank resources by depth, structure, and pedagogical value.

AI Quality Scoring

OpenAI evaluates each resource on content depth, organization, recency, and pedagogical design. The result is a composite quality score faculty can actually trust.

Schema Mapping

Normalizes metadata from Dublin Core, custom APIs, and inconsistent fields into a unified schema so cross-repository search and comparison actually works.

Search & Filter Interface

PHP web interface lets faculty browse by subject, format, license, and quality score. Instant search across thousands of indexed resources with faceted filtering.

Textbook Replacements

Recommends OER alternatives matched to specific textbooks and courses, with quality comparisons and adoption guidance for faculty.

In Action

See it in action.

Resource Search Interface

Faculty search across 50+ sources at once. They can filter by subject, format, quality score, and license type.

Batch Harvesting Dashboard

Monitor harvesting progress across all connected repositories, with per-source status and error tracking.

Quality Scoring Report

AI-generated quality breakdown for each resource, covering content depth, organization, and pedagogical value.

Results

The numbers speak.

Sources Indexed

From OpenStax to MIT OCW and beyond

Cost Reduction

Average student textbook spending drop

Resources Cataloged

Searchable, scored, and ready for adoption

Quality Scoring

OpenAI-powered evaluation of every resource

Before this tool, I spent an entire weekend searching five different OER sites for one psychology textbook replacement. Now I type in my subject, sort by quality score, and have three solid options in under a minute. My students saved over $150 each this semester.

Faculty Member Department of Behavioral Sciences

Insights

What I learned.

Metadata is the bottleneck, not content

There's no shortage of free educational material online. The real problem is that every repository describes its resources differently. Building the schema mapping layer took longer than building the harvesting connectors, but it's what made the whole system usable. Without normalized metadata, search across sources is meaningless.

PDF analysis changes faculty trust

Early versions only scored resources based on their metadata descriptions. Faculty didn't trust it because descriptions are often vague or overly optimistic. When I added PDF-first analysis that actually reads the content, quality scores became meaningful and adoption followed. Faculty need to know the system looked at the same thing they would.

Batch processing needs visibility

The initial CLI-only harvesting tool ran silently for hours. Faculty and administrators had no idea whether it was working, stuck, or finished. Adding a progress dashboard with per-source status and error counts turned a black box into something the team could monitor and trust to run on schedule.

Let's Talk

Ready to make textbooks
affordable?

Tell us about your OER goals. We'd like to hear how a centralized resource pipeline could save your faculty time and your students money.

No pitch. No pressure. Just a conversation about what might work.

OER Resource Manager

What was broken.

Fragmented Repositories

No Quality Signal

Textbook Cost Burden

Manual Search Fatigue

How we solved it.

Multi-Source Harvesting Pipeline

Schema Mapping & Metadata Normalization

AI-Powered Quality Scoring

PHP Web Interface & CLI Tools

Technologies Used

Struggling with textbook affordability?

What it actually does.

Multi-Source Harvesting

PDF-First Content Ranking

AI Quality Scoring

Schema Mapping

Search & Filter Interface

Textbook Replacements

See it in action.

Resource Search Interface

Batch Harvesting Dashboard

Quality Scoring Report

The numbers speak.

What I learned.

Metadata is the bottleneck, not content

PDF analysis changes faculty trust

Batch processing needs visibility

Related projects.

Library Resource Tool

D2L Course Analysis

Ready to make textbooksaffordable?

Ready to make textbooks
affordable?