Importing Files and Website URLs
Add content to your agent without needing an integration.
Table of Contents
Not every piece of useful content lives in a platform with an API. Some of it is in a PDF someone emailed around last year. Some of it is on a web page that never made it into the knowledge base. Some of it is in a spreadsheet that three people have a copy of.
Manual imports let you get that content into your agent without needing a formal integration. Upload the file, paste the URL, and your agent can start using it immediately. No developer needed, no account to connect, no ongoing sync to manage.
In this article, you'll learn:
- When to use manual imports vs. integrations
- How to upload a file
- How to crawl a website URL
- What to expect after importing
Manual Imports vs. Integrations
Manual imports are a one-time snapshot. Unlike connected integrations that auto-sync, manually imported content does not update automatically. If the file or page changes, you'll need to re-import it.
| Manual Import | Integration | |
|---|---|---|
| Setup | Immediate, no authentication | Requires connecting an account |
| Updates | Manual - re-import when content changes | Automatic - stays in sync |
| Best for | One-off documents, static content | Frequently updated content |
Use manual imports when:
- You don't have an integration for the platform your content lives in
- The content is static and unlikely to change often
- You want to quickly add a specific document without setting up a full integration
Uploading a File
- Go to the Sources tab.
- Click + Add Knowledge.
- In the Add Knowledge Sources modal, click Files under the Manual Imports section in the left panel.
- Either drag and drop your file into the upload area, or click browse files to select it from your computer.
- Wait for the upload to complete - Outlearn will process the file and add it to your Sources list.

Supported file formats: PDF, DOCX, CSV, XLSX, TXT, MD, HTML
Files that are password protected or heavily image-based (e.g., scanned PDFs with no selectable text) may not process correctly. Use text-based files wherever possible.
Crawling a Website URL
Website crawling lets you add content from any publicly accessible web page - your product site, a public help center, a landing page, or any other URL.
- Go to the Sources tab.
- Click + Add Knowledge.
- In the Add Knowledge Sources modal, click Website URL under the Manual Imports section in the left panel.
- Enter the URL you want to crawl in the input field (starting with
https://). - Choose your crawl options:
- Crawl Links On This Page (checked by default): Outlearn will follow all links on the page and crawl the entire domain - not just the single URL you entered. Recommended for most websites.
- Deep Crawl: Fetches links inside linked pages recursively - up to 500 pages. Use this for large sites where you want maximum coverage.
- Custom Headers: Add custom HTTP headers to the crawl request. This is useful if the page is behind a soft authentication layer or requires specific headers to load correctly. If you're not sure what this means, you don't need it.
- Click Start Crawling.

Outlearn will crawl the URL and process the content. Depending on the size of the site, this may take a few minutes. Once done, it will appear in your Sources list.
Only publicly accessible pages can be crawled. Password-protected pages, pages behind a login wall, or pages that block automated crawlers will not be accessible.
After Importing
Once your file or URL is processed, it will appear in your Sources list like any other source. From there you can:
- Change accessibility using the dropdown next to the source.
- Toggle it on or off using the In Use toggle.
- Delete it using the three-dot menu.
Manual imports do not have a Resync option - because they don't auto-sync, there's nothing to resync. To update the content, delete the existing source and re-import the updated file or re-crawl the URL.
Best Practices
- Name your files clearly before uploading - the file name becomes the source name in your Sources list, so "Q4 Product FAQ v3 FINAL.pdf" is harder to manage than "Product FAQ.pdf".
- Re-import files promptly when the underlying document changes - your agent will keep using the old version until you do.
- For website crawls, start with Crawl Links On This Page before trying Deep Crawl - it covers most sites well and is faster to process.
- Avoid crawling your entire website if only part of it is relevant to your agent. Crawl specific sections (e.g.,
yoursite.com/support) rather than the root domain to keep your sources focused.