Automated Redaction and Information Extraction Technology
Publishing or providing access to information contained in databases and document repositories is crucial to both government and commercial entities. But, documents must often be “scrubbed” of sensitive information before release.
The TeraDactor™ Automated Redaction/Extraction Engine automates the process of identifying and removing or extracting sensitive or useful information from documents or data sources. The redaction process is driven by business rules, initially defined by the user and then expanded as a self-learning process to ensure that different permutations of sensitive information are identified and removed. Rule sets for removing individual’s private information such as Social Security number, address, telephone number, and more; are provided with TeraDactor, and can be combined with user-defined rule sets for identifying and extracting other sensitive information.
Processed documents are available for manual review and approval, using the TeraDactor Visual Redaction Editor, or can be distributed immediately in their redacted form.
IT shops can integrate the Automated Redaction Engine into their enterprise document workflow, enabling the redaction of batches of documents on a pre-scheduled basis, or when retrieved. Or, in conjunction with the Visual Redaction Editor, users can redact, review, and edit individual documents as needed or as created. Redaction on creation is vital to government agencies, military operations, and law firms that are required to disclose or distribute sensitive information, and typically resort to redaction by hand on a document-by-document basis – a time-consuming and expensive process.
The TeraDactor Automated Redaction Engine provides seamless access to information in unstructured text in most document formats or databases to:
- Reduce document dissemination time and cost, making mission-critical information available to broad and diverse audiences in real time
- Access and correlate information from many document formats, including paper documents, Microsoft Word and Excel, PDF, DOC, XML, HTML and TIFF
- Eliminate the backlog of requests for documents – especially suited for Freedom of Information and legal disclosure situations