Marqo Cloud is built to be highly available. Configure indexes with replicas or multiple inference pods for redundancy. See our SLA for more details on availability and support.
Marqo cloud scales to meet your needs, you can expect low latency searches with millions of documents and high request velocity. Marqo searches include the inference to create your vectors.
Manage your accesses and API keys for members of your organisation with the Marqo Cloud console. Your API keys secure your Marqo end-point.
Vector generation and management are included out of the box. Marqo Cloud pricing is comprised of two parts: storage and inference. You can scale storage and inference to meet your needs. Billing is determined by the per-hour-price of your chosen instances multiplied by the number of instances you allocate.
Storage refers to the hardware which hosts your vectors and enables the searching of those vectors. Your storage scales with the size of your data. With Marqo you can pick from three tiers of storage: basic, balanced, or performance.
Marqo is a documents-in-documents-out system, inference hardware converts your documents into vectors for you. CPU instances are recommended for smaller models or where latency is not crucial, GPU instances are recommended for larger models where low latency is critical.
As you grow, you can scale your storage capacity and inference throughput by increasing your number of instances. Swap between CPU and GPU inference to customise your cost, concurrency, and latency behaviours. You can even sale to zero if not actively using an index.