Dahu Arc

Dahu ARC is our solution for managing your unstructured content to meet your GDPR obligations. In May 2018, the new General Data Protection Regulation (GDPR) became law. The bredth of GDPR is considerably wider than the previous Data Protection Act, as are the potential fines for getting it wrong!

GDPR defines specific measures that need to be taken with 'personal' and 'sensitive' data, wherever that content is held - and that now extends to content held in unstructured repositories like file shares and emails.

filing picture

Find the data you hold

The GDPR requires you to document what personal data you hold, where it came from, who you share it with and what you do with it.

The ICO has published a checklist that organisations can use to review their compliance with the GDPR. The first step on this list is to understand what information you hold and document it.

You can use DAHU ARC to run a dedicated GDPR personal and sensitive data discovery. Gather and process content from across all unmanaged data repositories including email inboxes, file shares, document archives and cloud drives. Find, tag, and score the content according to its type and level of sensitivity. Use results to inform your data audit and GDPR readiness reports

Assess the Risk

The GDPR requires an ongoing commitment to ensure every activity that may place risk on the lawful, fair and transparent processing of data is assessed, and suitable measures put in place.

The Data Protection Impact Assessment (DPIA) is the specific process mandated by the GDPR to ensure compliance with the regulations. It will be required before any change to existing, or introduction of new data processing that places risk to the rights and freedoms of data subjects.

You can use DAHU ARC to run a targeted GDPR personal data discovery on content likely to be used or exposed by a proposed change to existing processing, or a new activity. Find the personal data, flag it, tag it, and use the results to assess and mitigate the risks identified.

regulations picture
locks picture

Take Action

The GDPR states that organisations must enable the data subject to be aware of, and verify, the lawfulness of the processing”. The data controller must provide the data subject with a copy of the personal data undergoing processing and relevant related information on request

You can use Dahu ARC to respond to Subject Access Requests. Dahu ARC dashboard enables you to immediately run a dedicated advanced SAR search for all data relating to a specific request. Use the Dahu ARC dashboard to review the content so that appropriate content can be found, redacted or anonymised if necessary and collated in a compliant format so that it can be provided to the subject.

How does Dahu ARC work?

Dahu Arc uses a number of components to provide a complete Data Discovery solution. The tools include content connectors, advanced processing engines and user interfaces. Arc is designed to work with many commercial and open-source search engine platforms and runs on Windows or Linux, on-premise or in the cloud, or of course a combination of these. The components can be used to enrich an already-existing enterprise search solution, or to build a complete solution for data discovery and GDPR processing.


Dahu Edge logo


Dahu Edge is our unique series of content connectors designed to find and gather content from all your unstructured content repositories. Our connectors are specifically built to support and optimise data discovery. For instance, unlike normal search indexing, we keep records of all duplicate instances of content so we get a true picture of your data estate. Even when the content might normally be skipped due to size, content type or security, we always create a record with all the available metadata. Connectors available include Databases, File Systems and coming soon, cloud storage including Google Docs and Microsoft One Drive.

Dahu Vector logo


To make calculations about the level of risk in your unstructured content, you need to be able to identify all the personal and sensitive data held in that content, and make it available for analysis. This is what Dahu Vector is designed to do. It has a extensive rule base that allows it to discover all the GDPR-stipulated sensitive data types and personal references in any content that flows through it. It relies on a series of complementary technologies to do the identification including machine learning, NLP and pattern matching. It's vital to be able to understand and explain the decisions that processing systems take so we designed the processes that Vector uses to be fully audit-able.

Dahu Surface logo


To leverage the data we discover in the content using Dahu Edge connectors and Dahu Vector, you need supporting applications that can use that data in a way that is focused on the specific GDPR tasks. Dahu Surface is our User Interface platform that allows us to provide search tools and dashboards to support the SAR process and also track risk when undertaking your initial risk assessment or running a Data Protection Impact Assessment (DPIA) prior to a content processing task. Surface allows User Interface API translation such that our interfaces can work on most current search engine technologies.

How does Dahu ARC fit in your environment?

Dahu takes a very consultative aproach to providing solutions to our customers. We have a long history of providing search, discovery and analytics consultancy to many customers over the years. With that in mind, we designed ARC to be a complete solution for GDPR Data Discovery, or alternatively, used as standalone components integrated into existing systems. Lets discuss a few possible scenarios for using Dahu Edge connectors, Dahu Vector processing engine and Dahu Surface UI services.

Add Personal and Sensitive Data Discovery to your existing solutions

It is likely that you already have a significant investment in Enterprise Search technologies, possibly working hard on data-discovery duties. You might also have other big-data solutions that process your unstructured or unmanaged content. Perhaps you need to scan content as you prepare for migration to the Cloud, or prepare to put it under control in a records management system. For these kind of scenarios, you can use Dahu Vector, our content processing platform, to identify personal and sensitive data in your content as it is processed and use the resulting identified elements to make risk-based decisions.

scenario 1 picture

In this scenario, we assume you have a fairly typical Enterprise Search infrastructure with some content connectors, a pipeline processing environment and some existing search tools to make use of the indexed data. We can augment the processing that occurs in the pipeline by calling out to Dahu Vector to identify and markup references to personal and sensitive data. This data, including the specific type of data and the position information can then be indexed and used in your search applications.

This approach would let you augment your existing infrastructure and investment and allow you to meet your Subject Access Request (SAR) obligations.

Add content for Data Discovery to you existing Search

Being able to assess the level of risk across all your content is an important part of GDPR readyness. This means you need to be able to connect to the data where-ever it is. Dahu Edge connectors are designed to do this, and to record details on every file or document found.

scenario 2 picture

Here, we are using Dahu Edge connectors with an existing search system to extend its reach. The content might be on premise or might be in the cloud. If its in the cloud, we can run the Edge connectors in the cloud to avoid pulling all the content back out of the cloud. Edge maintains its own state information on every file or document it finds and automatically decides how often to revisit content areas to suit the frequency of changes. Edge works seemlessly with Vecor to process the content before passing it to an indexer.

In our scenario we are focused on integrating with search systems, but Edge and Vector are just as applicable to other data processing solutions, such as migtration tools or big-data suites. Vector can be configured with a variety of 'output' plugins to allow the discovered and enriched content to be directed wherever is most appropriate.

Deploy a full solution for GDPR Data Discovery

Of course, its possible to use Dahu Arc to provide a complete GDPR Data Discovery solution. This provides the necessary data connectivity and processing as well as the interfaces to manage your SARs as well as dashboards to manage your DPIAs and initial gap-analysis.

scenario 3 picture

Our interfaces (coming very soon) are built using best-of-breed web tools and techniques and are simple to install in most environments. They can run under our own Def Surface transformation engine, or under any application server or simple HTTP server.

Dahu Surface can host the search interfaces itself, providing in-line translation from popular search systems so that the interfaces work with existing tools and don't need to be re-coded. Surface also provides APIs to allow other systems to make use of the discovered personal and sensitve data and the associated risk profiles for the content.

Lets talk...

Dahu Arc is continually developing as we add new capbilities and features. We'd love the opertunity to discuss Data Discovery for GDPR, FOI or PCI with you.

Please Contact us If you'd like to learn more.


Search. Only better.