News, views, & tips to do data science in byte-sized drabbles about my daily life as a data professional.

How Do GLAMs Use AI and LLMs for Content Management?

Lately I'm really interested in how GLAMs are using AI and LLMs (large language models) for content management labor, especially for image identification of people or places and also image subjects/tags/descriptions.

I watched Ben Zhao's keynote recording from OR2025 Conference on how LLMs and GenAI works, if we can trust their output (authenticity), and the dangers/harms AI models pose to open repositories/libraries.

Ben's talk essentially asked if LLMs are the next/right interface for info/data access. What I got from it was that by themselves LLMs today can't by design be helpful to automating cultural heritage data labor. LLMs are complex Blackbox tools trained on trillions of tokens/billions of parameters that predicts what should/could happen next without any memory/knowledge/logic capabilities in those predictions. A lot of our labor we'd like to be automated requires logic/knowledge/discernment.

Our content management/DAM systems also have underlying issues like: being outdated; metadata is missing, incorrect, incomplete; disjointed search across multiple systems that don't talk to each other; and reconciliation issues with complex data needs that can be diverging or incompatible.

Some of the questions I still have about how to address AI-powered interfaces for content management labor relate to how we iterate in data work, sample sizes for QA, and transparency in documentation.

Introducing AI for descriptions or image identification of people in photos presents really compelling questions about how to address the underlying issues with an AI-powered interface for our labor. As a profession —regardless of which industry we might work in—we also have to grapple with practicalities about to how to do our work like documentation or quality assurance.

  • Iteration: Will AI-descriptions eventually fall into the current patterns to labor where descriptions are iterative processes meant to be remediated over time as data needs/standards change?

  • Sample size: How big should the human quality assurance sample size be for AI-described assets? How many outputs do we sample before it's "good enough"?

  • Transparency: How do we stay transparent about AI-descriptions vs human-descriptions when in the end the database treats all descriptions the same?

I want to leave you with one final thought from a digital/data preservation perspective. Data is lumped into relational tables that will likely lose context to whether it's human or AI driven in a CSV format for data export unless we literally build it in as a discrete column/field in the item record templates. I’ve yet to see a lot of talk about preservation and AI (AI in the sense that it is a broad word to mean computers doing tasks for humans; not necessarily GenAI which is one form of AI).

Curated Content: Data Fellowship

Apps are due February 28, 2026 for five PAID History and Archiving Fellowships with the Association for Computing Machinery (ACM). It’s open to indie scholars and scholars affiliated with an institution. Some past projects that got funded were about the evolution of digital content management systems, the history of wifi deregulation, the rise of computing at american universities, and a Y2K oral history.

  • You need a letter of endorsement from a scholarly or cultural organization.

  • Stipend: $4,000 USD

I’m Open to Work

My books for 2026 are open. I’m looking for remote freelance/consulting, part-time, and contract work doing database migrations, data assessments, digital projects, development/programming, digital asset management, analytics, narrative design, or data storytelling.

Have a cool project/kind of work not listed here in mind for me? DM me anyway!

Reply

Avatar

or to participate

Keep Reading

No posts found