Natural language processing transforms unstructured, natural language text into a structured, language-independent representation. In our system, this means identifying entities and events, and time associated with those events to make available information more easily understandable to a human. Once these entities and events are put into context, the machine can then structure this information and reveal connections to more technical threat indicators like IP addresses, hashes, and domains.
The graphic below illustrates the phases of natural language processing inside Recorded Future. We’ve developed a machine-learning module that initially determines which text is relevant and what should be ignored, stripping away advertising or links to other unrelated content.