The discussions taking place in professional services surrounding the topics of artificial intelligence (AI) and natural language processing (NLP) are ongoing. There are many concepts that library and information professionals should be aware of but here I am going to focus on AI and NLP.
Simply put, NLP can be defined as the ability of a computer programme to understand human speech , i.e. to process natural language. As such, natural language processing is just one of the many branches of artificial intelligence.
Instead of communicating with computer programmes in a technical commanding language, NLP enables a more natural form of communication. Well-known everyday examples of NLP can be found in Apple’s Siri or Amazon’s Alexa - both tools can respond to and understand natural human speech.
What use does natural language processing have in information services? There are many different ways to incorporate NLP into your library, primarily in support of your search function. In this article, we shall explore four of the ways NLP can be used - through sentiment analysis, entity extraction, keyword searching and concept extraction.
NLP adds a whole new dynamic to traditional Boolean searching. Instead of interpreting the literal meaning of each word, NLP takes the wider context and intent of the word or term into account. This means that it is easy for you to surface an incredibly specific set of search results, containing only the most relevant information and your search becomes more efficient.
In this regard, natural language processing is able to interpret the overarching mood of an article. This wouldn’t be possible without the use of NLP (unless done manually of course), since the system is now able to go beyond viewing the article as a compilation of individual words and can instead consider the article as a whole.
Such a process is known as sentiment analysis. Using sentiment analysis would enable you to assess whether articles written about your organisation, clients or competitors are positive or negative in their coverage. You can monitor the overall perception of your organisation or entity of interest in the news.
This could be of particular use for the marketing department which needs to have a clear understanding of their organisation’s image. Or it could be to monitor mentions of your clients which would be fed back to your fee-earners. They can then be alerted to reputation, litigation matters or anything else that could affect their public image.
A second manifestation of natural language processing in your library can be found in what is known as entity extraction. Since NLP interprets the context and article as a whole, it is able to tag each piece with specific entities, such as the geographic location e.g. United Kingdom or London.
This is far more advanced than just using such a term in your Boolean search criteria as, instead of searching for mentions of “United Kingdom”, the system searches only for articles that are actually about the United Kingdom and so a narrower set of much more relevant results will be surfaced.
This ties in nicely with keyword tagging and searching, whereby NLP is able to go beyond mere mentions of a term in an article and instead consider whether that article is actually substantially focussing on, let’s say, Apple or indeed if it only mentions them in passing. Using such keyword searching also enables you to filter out a great deal of the white noise and avoid bringing up results that mention, in this case, apple the fruit rather than Apple the company.
Fourthly, natural language processing can categorise concepts into themes for you, even if they are not mentioned directly in the text itself. An example of such concept extraction could be found in searching for articles relating to a specific industry.
Using NLP, the system is able to recognise brands and company names contained in the article text and place these into the appropriate industry. You are then able to search for that specific industry, and all of these relevant articles will come up in your results even if the industry itself isn’t directly mentioned.