Preprocessing Unstructured Data for LLM Applications

Learn to extract and normalize content from a wide variety of document types, such as PDFs, PowerPoints, Word, and HTML files, tables, and images to expand the information accessible to your LLM.

What you’ll learn in this course

Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources.

In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files.

Join this course and learn:

How to preprocess data for your LLM application development, focusing on how to work with different document types.
How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results.
Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables.
How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files.
Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.

Go to Class

Price Free

Language English

Duration 1 Hour

Certificate No

Course Pace Self Paced

Course Level Advanced

Course Category LLM

Course Instructor DeepLearning.AI

By Plugins

Browse Course by Category

Preprocessing Unstructured Data for LLM Applications

What you’ll learn in this course

Join this course and learn:

By Plugins

Browse Course by Category

Preprocessing Unstructured Data for LLM Applications

What you’ll learn in this course

Join this course and learn:

Related Courses

Build AI Apps with LangChain.js

Learn AI Agents