Automatic data type detection
The add method automatically tries to detect the data_type, based on your input for the source argument. Soapp.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')
is enough to embed a YouTube video.
This detection is implemented for all formats. It is based on factors such as whether it’s a URL, a local file, the source data type, etc.
Debugging automatic detection
Setlog_level: DEBUG
in the config yaml to debug if the data type detection is done right or not. Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
Forcing a data type
To omit any issues with the data type detection, you can force a data_type by adding it as aadd
method argument.
The examples below show you the keyword to force the respective data_type
.
Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
Remote data types
Use local files in remote data typesSome data_types are meant for remote content and only work with URLs.
You can pass local files by formatting the path using the
file:
URI scheme, e.g. file:///info.pdf
.Reusing a vector database
Default behavior is to create a persistent vector db in the directory ./db. You can split your application into two Python scripts: one to create a local vector db and the other to reuse this local persistent vector db. This is useful when you want to index hundreds of documents and separately implement a chat interface. Create a local index:Resetting an app and vector database
You can reset the app by simply calling thereset
method. This will delete the vector database and all other app related files.