I have a question around searching metadata for our member’s documents. One of our engineers recalls an expert suggesting that once we fully populate Box with our approx. 9 TBs of documents (~200M) and their associated metadata, we would not be able to search using the Box Metadata API endpoint for a unique member id (internal identifier) on a custom metadata (memberId) tag, to get back a collection of Box document IDs that relate to that particular member.
Instead, he suggested we build and utilize an external SQL database to host member metadata related to Box Folder and Document IDs. Is there any reason why we cannot utilize the Box APIs to do such searches in a performant way instead of having to build an external metadata management system to track this. We are already aware of the search limitation and the need for indices.
Please provide us with some clarification on the best way to handle this, what are the true limitations if any, and how can we overcome them?
We’ve discussed a similar topic recently.
Can you shed some light on the subject?
Hi @ahayat! From a straight up “number of files” perspective, there’s no limitation that would disqualify using the Metadata Query API to find your content. In fact, we have customers with hundreds of millions of files using the Query API without issue.
That said, the approach the expert suggested might have been due to a different scale consideration. I’ve heard of such things when, for example, the rate limits in place are too low to support a given use case.
So I’m afraid I can’t give you any exact answer on what the right approach for your situation is. As of today, the limitations you need to consider are the things detailed on our Limitations page (with one caveat…read on ).
You mentioned you were aware of the “need for indices”, so I wanted to clarify something. We recently completed an infrastructure upgrade that no longer requires us to construct manual indices. Now, you can build whatever queries you like, regardless of scale, without needing to contact the Box team to have those indices created. You can simply execute your queries! Some of our documentation is still in the process of being updated, like the page I linked above. Any references you might see to needing indexes, or “more than 10,000 instances” are no longer applicable.