|
38899
|
1440
|
29
|
2026-05-14T06:30:29.492384+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740229492_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6321079244253590506
|
9212448450938133143
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
38897
|
NULL
|
NULL
|
NULL
|
|
38897
|
1440
|
28
|
2026-05-14T06:30:28.891390+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740228891_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
7628314054346390763
|
9212449551993698967
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38895
|
1440
|
27
|
2026-05-14T06:30:28.354015+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740228354_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
38893
|
NULL
|
NULL
|
NULL
|
|
38893
|
1440
|
26
|
2026-05-14T06:30:22.863990+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740222863_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Click to scroll · shows current viewport
Screenpip Click to scroll · shows current viewport
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini...
|
[{"role":"AXStaticText","text& [{"role":"AXStaticText","text":"Click to scroll · shows current viewport","depth":2,"bounds":{"left":0.56050533,"top":0.28172386,"width":0.0674867,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6491027559172623413
|
8632611289480965085
|
visual_change
|
accessibility
|
NULL
|
Click to scroll · shows current viewport
Screenpip Click to scroll · shows current viewport
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38892
|
1440
|
25
|
2026-05-14T06:30:21.732912+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740221732_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-8798313854882897714
|
9133636559051606935
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said...
|
38890
|
NULL
|
NULL
|
NULL
|
|
38890
|
1440
|
24
|
2026-05-14T06:30:21.243833+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740221243_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6273318103258931730
|
8635993258403187671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38888
|
1440
|
23
|
2026-05-14T06:30:19.808140+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740219808_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6561738457798499810
|
9207950348802247319
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback...
|
38886
|
NULL
|
NULL
|
NULL
|
|
38886
|
1440
|
22
|
2026-05-14T06:30:19.179891+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740219179_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-2999843694217658596
|
9138138095013776023
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38883
|
1440
|
21
|
2026-05-14T06:30:18.215470+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740218215_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-9025361045718474523
|
8487585703635086293
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen...
|
38881
|
NULL
|
NULL
|
NULL
|
|
38881
|
1440
|
20
|
2026-05-14T06:30:17.493472+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740217493_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak)....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
8420433298994148935
|
8632685918832700381
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak)....
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38879
|
1440
|
19
|
2026-05-14T06:30:16.622614+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740216622_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
📅 Time Range Filter ▼
📅 Time Range Filter
▼
From:
09
:
45
To:
10
:...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 2071.1MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.064328454,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 2071.1MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03523936,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39960107,"top":0.059856344,"width":0.024767287,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42503324,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44930187,"top":0.059856344,"width":0.021110373,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.4710771,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.50681514,"top":0.059856344,"width":0.029587766,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53706783,"top":0.059856344,"width":0.034242023,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"📅 Time Range Filter ▼","depth":9,"bounds":{"left":0.45561835,"top":0.14205906,"width":0.41289893,"height":0.019553073},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"📅 Time Range Filter","depth":10,"bounds":{"left":0.642121,"top":0.14604948,"width":0.036070477,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"▼","depth":10,"bounds":{"left":0.67918885,"top":0.14604948,"width":0.0028257978,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"From:","depth":11,"bounds":{"left":0.45561835,"top":0.1819633,"width":0.009973404,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09","depth":12,"bounds":{"left":0.47290558,"top":0.18156424,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":11,"bounds":{"left":0.4787234,"top":0.18156424,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"45","depth":12,"bounds":{"left":0.4820479,"top":0.18156424,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"To:","depth":11,"bounds":{"left":0.5021609,"top":0.1819633,"width":0.005319149,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10","depth":12,"bounds":{"left":0.5147939,"top":0.18156424,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":11,"bounds":{"left":0.5206117,"top":0.18156424,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-3579967252539605047
|
8632611289464187869
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
📅 Time Range Filter ▼
📅 Time Range Filter
▼
From:
09
:
45
To:
10
:...
|
38878
|
NULL
|
NULL
|
NULL
|
|
38878
|
1440
|
18
|
2026-05-14T06:30:15.979604+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740215979_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
1656801680111661218
|
9212450652042222487
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38876
|
1440
|
17
|
2026-05-14T06:30:14.857118+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740214857_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
9116262651930194198
|
8635993256255712213
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually...
|
38874
|
NULL
|
NULL
|
NULL
|
|
38874
|
1440
|
16
|
2026-05-14T06:30:10.579716+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740210579_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-1975085737178031554
|
8487585703635086293
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38873
|
1440
|
15
|
2026-05-14T06:30:09.376835+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740209376_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-356206256024609894
|
8635993256255712213
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm...
|
38871
|
NULL
|
NULL
|
NULL
|
|
38871
|
1440
|
14
|
2026-05-14T06:30:08.074405+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740208074_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-4875849404314601076
|
9210195691199191703
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38869
|
1440
|
13
|
2026-05-14T06:30:07.564817+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740207564_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 2071.1MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.064328454,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 2071.1MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03523936,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39960107,"top":0.059856344,"width":0.024767287,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42503324,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44930187,"top":0.059856344,"width":0.021110373,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.4710771,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.50681514,"top":0.059856344,"width":0.029587766,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53706783,"top":0.059856344,"width":0.034242023,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2074530535825309775
|
8632611289464187869
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go...
|
38868
|
NULL
|
NULL
|
NULL
|
|
38868
|
1440
|
12
|
2026-05-14T06:30:07.016030+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740207016_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
4204195819504196869
|
9207950348802247319
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system....
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38866
|
1440
|
11
|
2026-05-14T06:30:06.186487+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740206186_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
399240680181179832
|
8631489596565941143
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific...
|
38864
|
NULL
|
NULL
|
NULL
|
|
38864
|
1440
|
10
|
2026-05-14T06:30:05.218488+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740205218_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2277958729343415079
|
9212449551993731735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38862
|
1440
|
9
|
2026-05-14T06:30:04.038342+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740204038_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
6189522351673888812
|
9133637658563218327
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said...
|
38860
|
NULL
|
NULL
|
NULL
|
|
38860
|
1440
|
8
|
2026-05-14T06:29:58.339749+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740198339_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
3876630403267576135
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38858
|
1440
|
7
|
2026-05-14T06:29:56.022209+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740196022_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
9114142236600951999
|
8632685918833748949
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or...
|
38857
|
NULL
|
NULL
|
NULL
|
|
38857
|
1440
|
6
|
2026-05-14T06:29:55.348622+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740195348_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38855
|
1440
|
5
|
2026-05-14T06:29:54.362573+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740194362_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
-8748234000105090305
|
9138139058697034391
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure...
|
38853
|
NULL
|
NULL
|
NULL
|
|
38853
|
1440
|
4
|
2026-05-14T06:29:53.459400+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740193459_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
-6359130848236053429
|
9207950348869356183
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38851
|
1440
|
3
|
2026-05-14T06:29:52.915772+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740192915_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
-4869836228348934231
|
8635993196193303191
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades...
|
38849
|
NULL
|
NULL
|
NULL
|
|
38849
|
1440
|
2
|
2026-05-14T06:29:52.447272+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740192447_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false}]...
|
1067745657424994536
|
8632611149894527965
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38848
|
1440
|
1
|
2026-05-14T06:29:49.772604+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740189772_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
4258317559196419736
|
9207950348869356183
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with...
|
38846
|
NULL
|
NULL
|
NULL
|
|
38846
|
1440
|
0
|
2026-05-14T06:29:44.216073+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740184216_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-1586418411768215451
|
8635993256255716309
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38843
|
1439
|
97
|
2026-05-14T06:29:41.765023+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740181765_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6646203381232203441
|
9207950348869323415
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38841
|
1439
|
96
|
2026-05-14T06:29:34.311190+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740174311_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-4561217559374952988
|
8632611141304593373
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone...
|
38840
|
NULL
|
NULL
|
NULL
|
|
38840
|
1439
|
95
|
2026-05-14T06:29:31.278108+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740171278_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
7478882291332896757
|
8487585695044103125
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable....
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38839
|
1439
|
94
|
2026-05-14T06:29:30.760358+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740170760_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2844658784128246394
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
38837
|
NULL
|
NULL
|
NULL
|
|
38837
|
1439
|
93
|
2026-05-14T06:29:29.876826+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740169876_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6441156580129037475
|
8635993256255716309
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38835
|
1439
|
92
|
2026-05-14T06:29:28.912072+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740168912_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
5474468892150520726
|
9135893854275914647
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4...
|
38833
|
NULL
|
NULL
|
NULL
|
|
38833
|
1439
|
91
|
2026-05-14T06:29:27.549239+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740167549_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
5216047225812634946
|
8635993189683710871
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38831
|
1439
|
90
|
2026-05-14T06:29:26.863053+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740166863_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
8234219729478526510
|
9133637656415718295
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking...
|
38829
|
NULL
|
NULL
|
NULL
|
|
38829
|
1439
|
89
|
2026-05-14T06:29:26.135029+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740166135_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
6170238253643292814
|
9133637656415718295
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38827
|
1439
|
88
|
2026-05-14T06:29:25.593328+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740165593_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
2859498591895944029
|
9207950348869356183
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the...
|
38825
|
NULL
|
NULL
|
NULL
|
|
38825
|
1439
|
87
|
2026-05-14T06:29:24.825512+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740164825_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 2071.1MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.064328454,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
5990861328622560469
|
8632611152042011613
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38824
|
1439
|
86
|
2026-05-14T06:29:24.180840+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740164180_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit)....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6778979162278726760
|
8636063624999889877
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit)....
|
38822
|
NULL
|
NULL
|
NULL
|
|
38822
|
1439
|
85
|
2026-05-14T06:29:21.812274+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740161812_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 2071.1MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.064328454,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 2071.1MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03523936,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39960107,"top":0.059856344,"width":0.024767287,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42503324,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44930187,"top":0.059856344,"width":0.021110373,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.4710771,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.50681514,"top":0.059856344,"width":0.029587766,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53706783,"top":0.059856344,"width":0.034242023,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
4863947415272186562
|
8632611152025234397
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38821
|
1439
|
84
|
2026-05-14T06:29:19.763576+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740159763_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
-6583949007147077149
|
8635993256255712215
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them...
|
38819
|
NULL
|
NULL
|
NULL
|
|
38819
|
1439
|
83
|
2026-05-14T06:29:18.999053+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740158999_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
2859498591895944029
|
9207950348869356183
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38817
|
1439
|
82
|
2026-05-14T06:29:17.874504+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740157874_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
8124591190541635378
|
8487585703635086293
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking...
|
38815
|
NULL
|
NULL
|
NULL
|
|
38815
|
1439
|
81
|
2026-05-14T06:29:17.132831+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740157132_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
-6359130848236053429
|
9207950348869356183
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38813
|
1439
|
80
|
2026-05-14T06:29:16.387435+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740156387_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-1771164907786598729
|
8636063624999889877
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually...
|
38811
|
NULL
|
NULL
|
NULL
|
|
38811
|
1439
|
79
|
2026-05-14T06:29:15.715991+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740155715_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 2071.1MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.064328454,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 2071.1MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03523936,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39960107,"top":0.059856344,"width":0.024767287,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42503324,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44930187,"top":0.059856344,"width":0.021110373,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.4710771,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-5038273962338706731
|
8632612251536862173
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 2071.1MB]
Screenpipe
[archive.db · 2071.1MB]
Activity
Search
Audio
Work Report...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38810
|
1439
|
78
|
2026-05-14T06:29:15.245236+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740155245_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-6114512187348945491
|
8635993256255712151
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your...
|
38808
|
NULL
|
NULL
|
NULL
|