Today we launch "Word Search" functionality on Fame, the company database for the UK and Ireland.
This new feature allows you to perform highly tailored searches on text within the original documents of millions of British annual reports. And it goes well beyond the information that already feeds into the structured elements of the database.
As anyone who has trudged through some of the lengthier accounts submitted each year to HMRC will confirm, these documents can often contain some of the most revealing information in companies' financials. But, for file formatting reasons, they've traditionally been impossible to scan other than by eye – so this all-important but hidden-away material has remained extremely hard to find.
So, what's new and how can you capitalise on this with Fame?
Tagging – and its by-products
Searchability is something we've come to expect of text on the internet. But sometimes what we see as a piece of text, a computer will "see" as a meaningless image. This applies to unprocessed PDFs and similar file types, unless converted with expensive software that takes time to run.
But in 2010 HMRC adopted a new process that the indexing system eXtensible Business Reporting Language (XBRL), specifically inline XBRL (iXBRL), can exploit. Typically, accounts are submitted in a normal viewable format, with lines of information conforming to the UK's prescribed accounting taxonomy. When the iXBRL software is run over these submissions, it picks up on an enormous and detailed list of recognised terms and tags them accordingly. This enables a single document to provide both human-readable and – as a potentially time-saving by-product – machine-readable data.
Annual report information in the same tagged format makes its way to the Companies House database, and it's from this growing repository that Fame takes annual report data. We've engineered the new module to interrogate the vast dataset.
Searching this vast dataset in an instant
And it is vast. With 80% of UK-based companies now complying with the new process, we're approaching a critical mass. Including the recent archive, this brings the current number of searchable documents to around 8 million. So we were well motivated to build this tool.
How does it work?
In essence, Fame's "Word Search" feature bridges the gap between structured and unstructured data:
- You select appropriate operators – "Must have", "Must not" or "Should have" – from an expandable list of dropdown menus, next to which you enter free text to create your combined search;
- The "Should have" operator allows you to prioritise certain documents over others in your results based on whether they contain terms you deem non-compulsory; and
- You can type the "~" proximity operator between words to further refine your search.
These often-sophisticated searches are then run across all of the original documents in our collection in an instant.
Here's an example search, along with the first few results:
Helping you to focus your research
The tool has many applications. You might be an accountant seeking business development opportunities with companies whose reports contain the phrase "defined benefit", a government officer tasked with a specific line of enquiry relating to an arcane word or phrase, or conducting statistical analysis on bulk sets of data that would otherwise be far less rich. Whatever your use case, your possibilities have expanded.
In summary, with Fame's "Word Search" you can:
- Apply specific searches across the original documents of all registered company accounts that have been submitted through this system (> 80% of the total) in a matter of seconds, rather than going through them one at a time;
- Discover particular legislative references, whether adhering to filing or regulatory requirements;
- Compare results from these searches with Fame's structured data for the same accounts, gaining a better understanding of the figures within them, such as "cost of goods sold" and "turnover"; and
- Perform very efficient, accurate and time-saving data interrogations, making use of multiple word-search steps and simple but versatile operators.