Importing Files and Website URLs

Add content to your agent without needing an integration.

Updated at April 24th, 2026

+ More

Table of Contents

Manual Imports vs. Integrations Uploading a File Crawling a Website URL After Importing Best Practices

Not every piece of useful content lives in a platform with an API. Some of it is in a PDF someone emailed around last year. Some of it is on a web page that never made it into the knowledge base. Some of it is in a spreadsheet that three people have a copy of.

Manual imports let you get that content into your agent without needing a formal integration. Upload the file, paste the URL, and your agent can start using it immediately. No developer needed, no account to connect, no ongoing sync to manage.

In this article, you'll learn:

When to use manual imports vs. integrations
How to upload a file
How to crawl a website URL
What to expect after importing

Manual Imports vs. Integrations

Manual imports are a one-time snapshot. Unlike connected integrations that auto-sync, manually imported content does not update automatically. If the file or page changes, you'll need to re-import it.

	Manual Import	Integration
Setup	Immediate, no authentication	Requires connecting an account
Updates	Manual - re-import when content changes	Automatic - stays in sync
Best for	One-off documents, static content	Frequently updated content

Use manual imports when:

You don't have an integration for the platform your content lives in
The content is static and unlikely to change often
You want to quickly add a specific document without setting up a full integration

Uploading a File

Go to the Sources tab.
Click + Add Knowledge.
In the Add Knowledge Sources modal, click Files under the Manual Imports section in the left panel.
Either drag and drop your file into the upload area, or click browse files to select it from your computer.
Wait for the upload to complete - Outlearn will process the file and add it to your Sources list.

Supported file formats: PDF, DOCX, CSV, XLSX, TXT, MD, HTML

Files that are password protected or heavily image-based (e.g., scanned PDFs with no selectable text) may not process correctly. Use text-based files wherever possible.

Crawling a Website URL

Website crawling lets you add content from any publicly accessible web page - your product site, a public help center, a landing page, or any other URL.

Go to the Sources tab.
Click + Add Knowledge.
In the Add Knowledge Sources modal, click Website URL under the Manual Imports section in the left panel.
Enter the URL you want to crawl in the input field (starting with https://).
Choose your crawl options:

Crawl Links On This Page (checked by default): Outlearn will follow all links on the page and crawl the entire domain - not just the single URL you entered. Recommended for most websites.
Deep Crawl: Fetches links inside linked pages recursively - up to 500 pages. Use this for large sites where you want maximum coverage.
Custom Headers: Add custom HTTP headers to the crawl request. This is useful if the page is behind a soft authentication layer or requires specific headers to load correctly. If you're not sure what this means, you don't need it.

Click Start Crawling.

Outlearn will crawl the URL and process the content. Depending on the size of the site, this may take a few minutes. Once done, it will appear in your Sources list.

Only publicly accessible pages can be crawled. Password-protected pages, pages behind a login wall, or pages that block automated crawlers will not be accessible.

After Importing

Once your file or URL is processed, it will appear in your Sources list like any other source. From there you can:

Change accessibility using the dropdown next to the source.
Toggle it on or off using the In Use toggle.
Delete it using the three-dot menu.

Manual imports do not have a Resync option - because they don't auto-sync, there's nothing to resync. To update the content, delete the existing source and re-import the updated file or re-crawl the URL.

Best Practices

Name your files clearly before uploading - the file name becomes the source name in your Sources list, so "Q4 Product FAQ v3 FINAL.pdf" is harder to manage than "Product FAQ.pdf".
Re-import files promptly when the underlying document changes - your agent will keep using the old version until you do.
For website crawls, start with Crawl Links On This Page before trying Deep Crawl - it covers most sites well and is faster to process.
Avoid crawling your entire website if only part of it is relevant to your agent. Crawl specific sections (e.g., yoursite.com/support) rather than the root domain to keep your sources focused.

url linking file upload