What are data provenance graphs?

We use data provenance graphs in auditing and intrusion detection for cyber security. These graphs describe the totality of system execution and help gather information regarding the data's origin, its present state, and who acted upon it.

Explanation

We use parse system logs, or audit logs, in data provenance analytics to create data provenance graphs. These help data link together and produce repeatable results, and build the data's full story in a graphical form.

To study data provenance graphs, we first need to understand their components. All vertices of the graph represent some file, socket, process, and so on. The edges of the graph define the causal relationship between the vertices.

Provenance graphs facilitate causal analysis by providing the following features:

Backward tracing: This helps analysts identify the root cause from the provided data.
Forward tracing: This allows analysts to find the ramifications of the attack and prevent further attacks of such types.

Example

The following illustration is an example of the execution of a data provenance graph:

Let's explain the execution process:

Userspace and Kernel: Here, the specified browser is the userspace that the operating system uses to interact with the KernelA component in the operating system where all the system callsPrograms that require operating system access to execute are handled.
Audit log: It saves the list of events that occur after performing a certain action. It first generates an ID is generated for the browser and then performs the read and write operations as required. These actions are analyzed to trace which function initiated a certain action.
Provenance graph: Bash is the terminal where codes are executed. Bash executes Firefox, and Firefox downloads a Mal.exe file. Later, a Mal process is spawned from the Mal.exe file. We can check all the related functions and events, and use them to trace the attack.

Note: We can also use provenance graphs to investigate security alerts fired by other monitoring products. The graphs can analyze all the connected and associated modules from malware.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design