|
38878
|
1440
|
18
|
2026-05-14T06:30:15.979604+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740215979_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"}]...
|
1656801680111661218
|
9212450652042222487
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
32252
|
1245
|
1
|
2026-05-13T09:12:11.322093+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663531322_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-4371765138006365901
|
9212450651505343127
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
32627
|
1251
|
47
|
2026-05-13T09:30:28.281065+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778664628281_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
6751987922156076843
|
9212450651504835223
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
30883
|
1217
|
24
|
2026-05-13T08:06:37.505223+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778659597505_m1.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Redo
Share and export
Copy
Show more options
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
7.5
KB/s
5.4
KB/s...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4","depth":20,"on_screen":true,"value":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":true,"is_selected":false},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4","depth":22,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"","depth":18,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"7.5","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"KB/s","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"5.4","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"KB/s","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-2459026422685731648
|
9212450649357724567
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Redo
Share and export
Copy
Show more options
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
7.5
KB/s
5.4
KB/s...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38864
|
1440
|
10
|
2026-05-14T06:30:05.218488+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740205218_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2277958729343415079
|
9212449551993731735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38885
|
1438
|
0
|
2026-05-14T06:30:19.194998+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740219194_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2277958729343415079
|
9212449551993731735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
38884
|
NULL
|
NULL
|
NULL
|
|
38966
|
1442
|
13
|
2026-05-14T06:31:37.255420+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740297255_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2277958729343415079
|
9212449551993731735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38897
|
1440
|
28
|
2026-05-14T06:30:28.891390+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740228891_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
7628314054346390763
|
9212449551993698967
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
30798
|
1216
|
25
|
2026-05-13T08:03:16.059469+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778659396059_m2.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.1245012,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.18355946,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.18355946,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"bounds":{"left":0.16023937,"top":0.19313647,"width":0.15026596,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.19393456,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"bounds":{"left":0.16023937,"top":0.19473264,"width":0.14860372,"height":0.09297685},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.32083002,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.32322428,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.36472467,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.36671987,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"bounds":{"left":0.0787899,"top":0.367917,"width":0.22839096,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"bounds":{"left":0.0787899,"top":0.41819632,"width":0.11153591,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"bounds":{"left":0.1903258,"top":0.41819632,"width":0.06382979,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"bounds":{"left":0.0787899,"top":0.41819632,"width":0.22822474,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"bounds":{"left":0.09823803,"top":0.46009576,"width":0.008643617,"height":0.015961692},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"bounds":{"left":0.0787899,"top":0.509976,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"bounds":{"left":0.0787899,"top":0.51157224,"width":0.094913565,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"bounds":{"left":0.0787899,"top":0.53790903,"width":0.21575798,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"bounds":{"left":0.09142287,"top":0.58818835,"width":0.036901597,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"bounds":{"left":0.09142287,"top":0.58818835,"width":0.21060506,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"bounds":{"left":0.09142287,"top":0.63846767,"width":0.047539894,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"bounds":{"left":0.13896276,"top":0.63846767,"width":0.08061835,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"bounds":{"left":0.22423537,"top":0.6388667,"width":0.008643617,"height":0.015961692},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"bounds":{"left":0.0787899,"top":0.67198724,"width":0.2330452,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"bounds":{"left":0.0787899,"top":0.73343974,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"bounds":{"left":0.0787899,"top":0.7350359,"width":0.11103723,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"bounds":{"left":0.0787899,"top":0.7613727,"width":0.234375,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"bounds":{"left":0.0787899,"top":0.81165206,"width":0.079953454,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"bounds":{"left":0.09142287,"top":0.84118116,"width":0.030418882,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"bounds":{"left":0.12184176,"top":0.84118116,"width":0.15558511,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"bounds":{"left":0.09142287,"top":0.84118116,"width":0.21442819,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"bounds":{"left":0.13297872,"top":0.8619314,"width":0.042054523,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"bounds":{"left":0.09142287,"top":0.8619314,"width":0.21675532,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"bounds":{"left":0.15674867,"top":0.9038308,"width":0.008643617,"height":0.015961692},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"bounds":{"left":0.09142287,"top":0.93296087,"width":0.029753989,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"bounds":{"left":0.09142287,"top":0.93296087,"width":0.21708776,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"bounds":{"left":0.09142287,"top":0.9537111,"width":0.21958111,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"bounds":{"left":0.203125,"top":0.9748603,"width":0.008643617,"height":0.015961692},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"bounds":{"left":0.09142287,"top":1.0,"width":0.021775266,"height":-0.0039904118},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"bounds":{"left":0.11319814,"top":1.0,"width":0.12849069,"height":-0.0039904118},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"bounds":{"left":0.24168883,"top":1.0,"width":0.043218084,"height":-0.0039904118},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"bounds":{"left":0.09142287,"top":1.0,"width":0.21392952,"height":-0.0039904118},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"bounds":{"left":0.0787899,"top":1.0,"width":0.234375,"height":-0.086193085},"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.09690824,"height":-0.0877893},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
9115870005368710714
|
9212449549845584535
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates....
|
30796
|
NULL
|
NULL
|
NULL
|
|
39004
|
1442
|
33
|
2026-05-14T06:32:42.412431+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740362412_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
193450813362110442
|
9212448452481571735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates....
|
39002
|
NULL
|
NULL
|
NULL
|
|
38899
|
1440
|
29
|
2026-05-14T06:30:29.492384+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740229492_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6321079244253590506
|
9212448450938133143
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
38897
|
NULL
|
NULL
|
NULL
|
|
38986
|
1442
|
24
|
2026-05-14T06:32:21.696973+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740341696_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6321079244253590506
|
9212448450938133143
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
30937
|
1219
|
5
|
2026-05-13T08:11:40.873611+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778659900873_m1.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
5945623947886182256
|
9212448450334022279
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38844
|
NULL
|
0
|
2026-05-14T06:29:42.800749+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740182800_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-4230104970166636534
|
9212447352970476439
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response...
|
38843
|
NULL
|
NULL
|
NULL
|
|
38946
|
1442
|
4
|
2026-05-14T06:31:22.812856+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740282812_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-5036377224840580351
|
9212447352970476439
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
38944
|
NULL
|
NULL
|
NULL
|
|
39022
|
1442
|
42
|
2026-05-14T06:33:01.685980+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740381685_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-4230104970166636534
|
9212447352970476439
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38793
|
1439
|
69
|
2026-05-14T06:28:59.467325+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740139467_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-7041800055836932766
|
9212447352970459799
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38929
|
1440
|
47
|
2026-05-14T06:31:06.469096+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740266469_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-7041800055836932766
|
9212447352970459799
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
38927
|
NULL
|
NULL
|
NULL
|
|
38755
|
1439
|
50
|
2026-05-14T06:28:06.135371+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740086135_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38804
|
1439
|
75
|
2026-05-14T06:29:12.817031+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740152817_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2844658784128246394
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38839
|
1439
|
94
|
2026-05-14T06:29:30.760358+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740170760_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
2844658784128246394
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
38837
|
NULL
|
NULL
|
NULL
|
|
38857
|
1440
|
6
|
2026-05-14T06:29:55.348622+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740195348_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38860
|
1440
|
8
|
2026-05-14T06:29:58.339749+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740198339_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
3876630403267576135
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38895
|
1440
|
27
|
2026-05-14T06:30:28.354015+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740228354_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
38893
|
NULL
|
NULL
|
NULL
|
|
38954
|
1442
|
8
|
2026-05-14T06:31:27.892907+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740287892_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-5254147595515345823
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources...
|
38952
|
NULL
|
NULL
|
NULL
|
|
39028
|
1442
|
45
|
2026-05-14T06:33:04.964047+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740384964_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
5603084410119199502
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export...
|
39026
|
NULL
|
NULL
|
NULL
|
|
39098
|
1442
|
76
|
2026-05-14T06:35:57.289461+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740557289_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
5603084410119199502
|
9212447352970443671
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export...
|
39095
|
NULL
|
NULL
|
NULL
|
|
39273
|
1444
|
85
|
2026-05-14T06:41:18.530848+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740878530_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3552479194466422982
|
9212447352970443671
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38754
|
1437
|
39
|
2026-05-14T06:28:06.135352+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740086135_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-6122458525774068188
|
9212447352970427287
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
38752
|
NULL
|
NULL
|
NULL
|
|
38960
|
1442
|
10
|
2026-05-14T06:31:30.019404+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740290019_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-6122458525774068188
|
9212447352970427287
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
38957
|
NULL
|
NULL
|
NULL
|
|
39141
|
1443
|
8
|
2026-05-14T06:37:24.427486+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740644427_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-6122458525774068188
|
9212447352970427287
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38659
|
1435
|
14
|
2026-05-14T06:26:03.961061+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778739963961_m1.jpg...
|
Firefox
|
New Tab — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
New Tab
about:newtab
Screenpipe — Archive
Screenpi New Tab
about:newtab
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Close tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources...
|
[{"role":"AXStaticText","text& [{"role":"AXStaticText","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"about:newtab","depth":4,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-2585975510035951675
|
9212447352969943703
|
click
|
accessibility
|
NULL
|
New Tab
about:newtab
Screenpipe — Archive
Screenpi New Tab
about:newtab
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Close tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
31993
|
1241
|
24
|
2026-05-13T09:03:11.746985+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778662991746_m2.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.1245012,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-3837183369185887122
|
9212447350822337431
|
app_switch
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
31346
|
1229
|
0
|
2026-05-13T08:35:51.063966+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778661351063_m1.jpg...
|
Windsurf
|
screenpipe [SSH: nas] — screenpipe_sync_files.sh — screenpipe [SSH: nas] — screenpipe_sync_files.sh — Untracked...
|
1
|
NULL
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Explorer (⌥⌘E)
Search (⇧⌘F)
Source Control (⇧⌘ Explorer (⌥⌘E)
Search (⇧⌘F)
Source Control (⇧⌘G) - 19 pending changes
19
Codemaps
DeepWiki
Run and Debug
Remote Explorer
Extensions (⇧⌘X)
Explorer
Explorer
Explorer Section: screenpipe [SSH: nas]
Explorer Section: screenpipe [SSH: nas]
screenpipe [SSH: nas]
#recycle
app
data
lib
screenpipe_sync_db.sh
U
screenpipe_sync_files.sh
U
screenpipe_sync_helpers.sh
U
logs
pipes
.gitignore
app_settings.json
archive.db
archive.db-bak
U
archive.db.bak-pre-installid
U
db.sqlite
db.sqlite-shm
db.sqlite-wal
screenpipe_fts_migrate.sh
U
screenpipe_sync copy.sh
U
screenpipe_sync_old.sh
U
screenpipe_sync_updated.sh
U
screenpipe_sync.sh
M
screenpipe.db
Outline Section
Outline
Outline
Timeline Section
Timeline
Timeline
remote SSH: nas
SSH: nas
screenpipe (Git) - master*, Checkout Branch/Tag...
master*
screenpipe (Git) - Synchronize Changes
No Problems
0
0
Forwarded Ports: 41257, 36613, 33153, 45375
4
Notifications
Windsurf - Settings
Teams, Daily: 0% quota used · Weekly: 68% quota used
Shell Script
LF
UTF-8
Spaces: 2
Ln 7, Col 1 (100 selected)
Screen Reader Optimized
Info: Setting up SSH Host (details): Creating local forwarding server...
Clear
Refactor Sync Script
15h
2h
15m
2m
1m
Show More
Thought
for 1s
The
ocr_text
table has no
id
column - its primary key is
frame_id
. The script incorrectly assumes all tables have an
id
column when creating the unique index.
The fix is in
screenpipe_sync.sh
around line 363-370. The loop needs to check if the table has an
id
column before creating the index:
bash
# Re-run install_id + index setup so freshly-created tables get them too.
for
tbl
in
"${
ALL_SYNC_TABLES
[
@
]}"
;
do
HAS_COL
=
$(
sqlite3
"
$NAS_DB
"
"SELECT COUNT(*) FROM pragma_table_info('
$tbl
') WHERE name='install_id';"
)
if
[
"
$HAS_COL
"
=
"0"
];
then
sqlite3
"
$NAS_DB
"
"ALTER TABLE
$tbl
ADD COLUMN install_id TEXT;"
fi
# Check if table has 'id' column before creating index
HAS_ID
=
$(
sqlite3
"
$NAS_DB
"
"SELECT COUNT(*) FROM pragma_table_info('
$tbl
') WHERE name='id';"
)
if
[
"
$HAS_ID
"
-gt
"0"
];
then
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, id);"
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
done
The same fix applies to
screenpipe_fts_migrate.sh
at line 106. Both scripts have the same bug.
You'll need to switch to Code mode to apply these changes.
Feedback submitted
fix it
Expand content to scroll
Thoughts
I'll fix both scripts to handle tables without an
id
column.
Read
screenpipe_sync.sh and screenpipe_fts_migrate.sh
Thought
for 1s
screenpipe_sync.sh
+
8
-
1
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
done
screenpipe_fts_migrate.sh
+
8
-
1
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
fi
Thoughts
Fixed both scripts. They now check if a table has an
id
column before creating the unique index. If the table lacks an
id
column (like
ocr_text
), it uses
rowid
instead.
Changes made:
screenpipe_sync.sh
(line 369-375): Added check for
id
column, fallback to
rowid
screenpipe_fts_migrate.sh
(line 106-113): Added check for
id
column, fallback to
rowid
You can now retry running the sync script from your Mac.
Feedback submitted
lukas@Lukas-Kovaliks-MacBook-Pro-Jiminny ~/.screenpipe $ ~/.screenpipe/screenpipe_sync.sh 2026-05-11
[2026-05-12 20:58:45] ========================================
[2026-05-12 20:58:45] Screenpipe sync starting for: 2026-05-11
[2026-05-12 20:58:45] ========================================
[+00m00s] ▶ Preflight checks
Source DB: OK (4.2G)
NAS mount: OK /Volumes/screenpipe
Archive DB: will be created
Data dir: OK (283 files, 318M)
[+00m00s] ▶ Counting source rows for 2026-05-11
frames: 6857
elements: 672129
ui_events: 7063
ocr_text: 2332
meetings: 1
[+00m00s] ▶ Initialising tables (CREATE IF NOT EXISTS)
creating tables ✓ 0m00s
[+00m00s] ▶ Reconciling NAS schema with source
schema: video_chunks ✓ in sync
schema: frames ✓ in sync
schema: elements ✓ in sync
schema: ocr_text ✓ in sync
schema: ui_events ✓ in sync
schema: meetings ✓ in sync
creating indexes ✓ 0m01s
creating FTS tables ✓ 0m00s
[+00m02s] ▶ Syncing data for 2026-05-11
video_chunks ✓ 0m00s
frames (6857 rows) ✓ 2m37s
ocr_text (2332 rows) ⠋ Parse error near line 3: ambiguous column name: app_name
acted_at") SELECT "frame_id","text","text_json","app_name","ocr_engine","win
error here ---^
Expand content to scroll
Thought
for 3s
The error occurs because the
ocr_text
INSERT uses a JOIN with
frames
, and both tables have an
app_name
column. The script uses unqualified column names, causing SQLite to be ambiguous about which table's
app_name
to select.
The script's comment assumes no column name collisions, but
app_name
exists in both tables. The fix is to qualify all columns in the SELECT with the table alias
o.
.
Let me check the current INSERT statement:
Error while viewing
screenpipe_sync.sh
Thoughts
Read
screenpipe_sync.sh
#L320-334
Thoughts
Searched
ocr_text.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 2s
Searched
INSERT.*ocr_text in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 1s
screenpipe_sync.sh
+
2
-
1
INSERT OR IGNORE INTO nas.ocr_text
SELECT o.* FROM main.ocr_text o
SELECT o.frame_id, o.text, o.text_json, o.app_name, o.ocr_engine, o.window_name, o.focused, o.text_length, o.sync_id, o.synced_at, o.redacted_at
FROM main.ocr_text o
JOIN main.frames f ON o.frame_id = f.id
Thoughts
Fixed the
ocr_text
INSERT by explicitly listing columns instead of using
o.*
, which caused the ambiguous column name error with
app_name
existing in both
ocr_text
and
frames
.
Let me also check the
elements
table INSERT since it also uses a JOIN with frames and might have the same issue:
Searched
elements.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 1s
The script you're running is
~/.screenpipe/screenpipe_sync.sh
on your Mac, but I've been editing
screenpipe_sync.sh
on the NAS. These are different files.
You need to either:
Copy the fixed script from NAS to Mac, or
Apply the same fix to your Mac's script
The fix for the
ocr_text
ambiguous column error is to change:
bash
SELECT
o.
*
FROM
main.ocr_text
o
to:
bash
SELECT
o.frame_id,
o.text,
o.text_json,
o.app_name,
o.ocr_engine,
o.window_name,
o.focused,
o.text_length,
o.sync_id,
o.synced_at,
o.redacted_at
FROM
main.ocr_text
o
Which script are you actually running on your Mac - the one in
~/.screenpipe/
?
Feedback submitted
Now add also audio data. CREATE TABLE audio_chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL,
timestamp TIMESTAMP,
sync_id TEXT,
machine_id TEXT,
synced_at DATETIME,
evicted_at TIMESTAMP DEFAULT NULL
)
CREATE TABLE "audio_transcriptions" (
id INTEGER PRIMARY KEY AUTOINCREMENT,
audio_chunk_id INTEGER NOT NULL,
offset_index INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL,
transcription TEXT NOT NULL,
device TEXT NOT NULL DEFAULT '',
is_input_device BOOLEAN NOT NULL DEFAULT TRUE,
speaker_id INTEGER,
transcription_engine TEXT NOT NULL DEFAULT 'Whisper',
start_time REAL,
end_time REAL,
text_length INTEGER,
sync_id TEXT,
synced_at DATETIME,
redacted_at INTEGER,
FOREIGN KEY (audio_chunk_id) REFERENCES audio_chunks(id)
) CREATE VIRTUAL TABLE audio_transcriptions_fts USING fts5 (
transcription,
device,
speaker_id,
content='audio_transcriptions',
content_rowid='id',
tokenize='unicode61'
) This is still not included in the script (mac to nas) but it exists on original app (mac). Add it to the script. Also move logs (~/.screenpipe/screenpipe.2026-05-09.0.log) to /logs/screenpipe.2026-05-09.0.log on nas.
Expand content to scroll
Thought
for 1s
I'll add audio data syncing and log copying to the script. Let me first check the current script structure:
Read
screenpipe_sync.sh
Thought
for 3s
I'll add audio data syncing and log copying to the script. Let me make the necessary changes:
Error while editing
screenpipe_sync.sh
Thoughts
screenpipe_sync.sh
+
2
SRC_MEETINGS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM meetings WHERE date(meeting_start) = '
$TARGET_DATE
';"
)
SRC_ACHUNKS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM audio_chunks WHERE date(timestamp) = '
$TARGET_DATE
';"
)
SRC_ATRANS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM audio_transcriptions WHERE audio_chunk_id IN (SELECT id FROM audio_chunks WHERE date(timestamp) = '
$TARGET_DATE
');"
)
Thought
for 1s
screenpipe_sync.sh
+
5
-
3
printf
" %-20s %s\n"
"audio_transcriptions:"
"
$SRC_ATRANS
"
if
[
"...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Explorer (⌥⌘E)","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true,"is_expanded":true},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"Search (⇧⌘F)","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"Source Control (⇧⌘G) - 19 pending changes","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"19","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"Codemaps","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"DeepWiki","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXRadioButton","text":"Run and Debug","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"Remote Explorer","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXRadioButton","text":"Extensions (⇧⌘X)","depth":18,"on_screen":true,"role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXHeading","text":"Explorer","depth":17,"on_screen":true,"role_description":"heading"},{"role":"AXStaticText","text":"Explorer","depth":18,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Explorer Section: screenpipe [SSH: nas]","depth":21,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":true},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXHeading","text":"Explorer Section: screenpipe [SSH: nas]","depth":22,"on_screen":true,"role_description":"heading"},{"role":"AXStaticText","text":"screenpipe [SSH: nas]","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"#recycle","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"data","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"lib","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync_db.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync_files.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync_helpers.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"logs","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":26,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"pipes","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":".gitignore","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app_settings.json","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"archive.db","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"archive.db-bak","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"archive.db.bak-pre-installid","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"db.sqlite","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"db.sqlite-shm","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"db.sqlite-wal","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_fts_migrate.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync copy.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync_old.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync_updated.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"U","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"M","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe.db","depth":27,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Outline Section","depth":21,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXHeading","text":"Outline","depth":22,"on_screen":true,"role_description":"heading"},{"role":"AXStaticText","text":"Outline","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Timeline Section","depth":21,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXHeading","text":"Timeline","depth":22,"on_screen":true,"role_description":"heading"},{"role":"AXStaticText","text":"Timeline","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"remote SSH: nas","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SSH: nas","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"screenpipe (Git) - master*, Checkout Branch/Tag...","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"master*","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"screenpipe (Git) - Synchronize Changes","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"No Problems","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"0","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"0","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Forwarded Ports: 41257, 36613, 33153, 45375","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"4","depth":17,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Notifications","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Windsurf - Settings","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Teams, Daily: 0% quota used · Weekly: 68% quota used","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Shell Script","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"LF","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"UTF-8","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Spaces: 2","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Ln 7, Col 1 (100 selected)","depth":16,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Screen Reader Optimized","depth":16,"bounds":{"left":1.0,"top":0.0,"width":-0.07847226,"height":0.02},"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Info: Setting up SSH Host (details): Creating local forwarding server...","depth":12,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"Clear","depth":12,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"Refactor Sync Script","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"15h","depth":19,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"2h","depth":19,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"15m","depth":19,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"2m","depth":19,"on_screen":false,"role_description":"text"},{"role":"AXStaticText","text":"1m","depth":19,"on_screen":false,"role_description":"text"},{"role":"AXButton","text":"Show More","depth":19,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"table has no","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column - its primary key is","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"frame_id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":". The script incorrectly assumes all tables have an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column when creating the unique index.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The fix is in","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"around line 363-370. The loop needs to check if the table has an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column before creating the index:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"bash","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"# Re-run install_id + index setup so freshly-created tables get them too.","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"in","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"${","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ALL_SYNC_TABLES","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"@","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"]}\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":";","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"do","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"HAS_COL","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$(","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"SELECT COUNT(*) FROM pragma_table_info('","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"') WHERE name='install_id';\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"if","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$HAS_COL","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"0\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"];","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"then","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"ALTER TABLE","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ADD COLUMN install_id TEXT;\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fi","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"# Check if table has 'id' column before creating index","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"HAS_ID","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$(","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"SELECT COUNT(*) FROM pragma_table_info('","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"') WHERE name='id';\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"if","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$HAS_ID","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"-gt","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"0\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"];","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"then","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"CREATE UNIQUE INDEX IF NOT EXISTS idx_${","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}_install_pk ON ${","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}(install_id, id);\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"else","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"# For tables without 'id', use rowid instead","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"CREATE UNIQUE INDEX IF NOT EXISTS idx_${","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}_install_pk ON ${","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}(install_id, rowid);\"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fi","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"done","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The same fix applies to","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_fts_migrate.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"at line 106. Both scripts have the same bug.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"You'll need to switch to Code mode to apply these changes.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Feedback submitted","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fix it","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Expand content to scroll","depth":23,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"I'll fix both scripts to handle tables without an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Read","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh and screenpipe_fts_migrate.sh","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"+","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"8","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"-","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"1","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"else","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"# For tables without 'id', use rowid instead","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"CREATE UNIQUE INDEX IF NOT EXISTS idx_${","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}_install_pk ON ${","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}(install_id, rowid);\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fi","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"done","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_fts_migrate.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"+","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"8","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"-","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"1","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"else","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"# For tables without 'id', use rowid instead","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$NAS_DB","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"CREATE UNIQUE INDEX IF NOT EXISTS idx_${","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}_install_pk ON ${","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tbl","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"}(install_id, rowid);\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fi","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"fi","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Fixed both scripts. They now check if a table has an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column before creating the unique index. If the table lacks an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column (like","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"), it uses","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"rowid","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"instead.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Changes made:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"(line 369-375): Added check for","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column, fallback to","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"rowid","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_fts_migrate.sh","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"(line 106-113): Added check for","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column, fallback to","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"rowid","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"You can now retry running the sync script from your Mac.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Feedback submitted","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"lukas@Lukas-Kovaliks-MacBook-Pro-Jiminny ~/.screenpipe $ ~/.screenpipe/screenpipe_sync.sh 2026-05-11","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[2026-05-12 20:58:45] ========================================","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[2026-05-12 20:58:45] Screenpipe sync starting for: 2026-05-11","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[2026-05-12 20:58:45] ========================================","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[+00m00s] ▶ Preflight checks","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Source DB: OK (4.2G)","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"NAS mount: OK /Volumes/screenpipe","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Archive DB: will be created","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Data dir: OK (283 files, 318M)","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[+00m00s] ▶ Counting source rows for 2026-05-11","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"frames: 6857","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"elements: 672129","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ui_events: 7063","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text: 2332","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"meetings: 1","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[+00m00s] ▶ Initialising tables (CREATE IF NOT EXISTS)","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"creating tables ✓ 0m00s","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[+00m00s] ▶ Reconciling NAS schema with source","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: video_chunks ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: frames ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: elements ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: ocr_text ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: ui_events ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"schema: meetings ✓ in sync","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"creating indexes ✓ 0m01s","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"creating FTS tables ✓ 0m00s","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[+00m02s] ▶ Syncing data for 2026-05-11","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"video_chunks ✓ 0m00s","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"frames (6857 rows) ✓ 2m37s","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text (2332 rows) ⠋ Parse error near line 3: ambiguous column name: app_name","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"acted_at\") SELECT \"frame_id\",\"text\",\"text_json\",\"app_name\",\"ocr_engine\",\"win","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"error here ---^","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Expand content to scroll","depth":23,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 3s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The error occurs because the","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"INSERT uses a JOIN with","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"frames","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":", and both tables have an","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app_name","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"column. The script uses unqualified column names, causing SQLite to be ambiguous about which table's","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app_name","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"to select.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The script's comment assumes no column name collisions, but","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app_name","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"exists in both tables. The fix is to qualify all columns in the SELECT with the table alias","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":".","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Let me check the current INSERT statement:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Error while viewing","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Read","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"#L320-334","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Searched","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 2s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Searched","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"INSERT.*ocr_text in /volume1/screenpipe (screenpipe_sync.sh)","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"+","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"2","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"-","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"1","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"INSERT OR IGNORE INTO nas.ocr_text","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SELECT o.* FROM main.ocr_text o","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SELECT o.frame_id, o.text, o.text_json, o.app_name, o.ocr_engine, o.window_name, o.focused, o.text_length, o.sync_id, o.synced_at, o.redacted_at","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"FROM main.ocr_text o","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"JOIN main.frames f ON o.frame_id = f.id","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Fixed the","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"INSERT by explicitly listing columns instead of using","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.*","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":", which caused the ambiguous column name error with","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"app_name","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"existing in both","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"and","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"frames","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":".","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Let me also check the","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"elements","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"table INSERT since it also uses a JOIN with frames and might have the same issue:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Searched","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"elements.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The script you're running is","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"~/.screenpipe/screenpipe_sync.sh","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"on your Mac, but I've been editing","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"on the NAS. These are different files.","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"You need to either:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Copy the fixed script from NAS to Mac, or","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Apply the same fix to your Mac's script","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"The fix for the","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ocr_text","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ambiguous column error is to change:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"bash","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SELECT","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"*","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"FROM","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"main.ocr_text","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"to:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"bash","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SELECT","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.frame_id,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.text,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.text_json,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.app_name,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.ocr_engine,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.window_name,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.focused,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.text_length,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.sync_id,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.synced_at,","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o.redacted_at","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"FROM","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"main.ocr_text","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"o","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Which script are you actually running on your Mac - the one in","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"~/.screenpipe/","depth":22,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"?","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Feedback submitted","depth":20,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Now add also audio data. CREATE TABLE audio_chunks (","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id INTEGER PRIMARY KEY AUTOINCREMENT,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"file_path TEXT NOT NULL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"timestamp TIMESTAMP,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sync_id TEXT,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"machine_id TEXT,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"synced_at DATETIME,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"evicted_at TIMESTAMP DEFAULT NULL","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"CREATE TABLE \"audio_transcriptions\" (","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"id INTEGER PRIMARY KEY AUTOINCREMENT,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"audio_chunk_id INTEGER NOT NULL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"offset_index INTEGER NOT NULL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"timestamp TIMESTAMP NOT NULL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"transcription TEXT NOT NULL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"device TEXT NOT NULL DEFAULT '',","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"is_input_device BOOLEAN NOT NULL DEFAULT TRUE,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"speaker_id INTEGER,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"transcription_engine TEXT NOT NULL DEFAULT 'Whisper',","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"start_time REAL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"end_time REAL,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"text_length INTEGER,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sync_id TEXT,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"synced_at DATETIME,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"redacted_at INTEGER,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"FOREIGN KEY (audio_chunk_id) REFERENCES audio_chunks(id)","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":") CREATE VIRTUAL TABLE audio_transcriptions_fts USING fts5 (","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"transcription,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"device,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"speaker_id,","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"content='audio_transcriptions',","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"content_rowid='id',","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"tokenize='unicode61'","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":") This is still not included in the script (mac to nas) but it exists on original app (mac). Add it to the script. Also move logs (~/.screenpipe/screenpipe.2026-05-09.0.log) to /logs/screenpipe.2026-05-09.0.log on nas.","depth":25,"on_screen":true,"role_description":"text"},{"role":"AXButton","text":"Expand content to scroll","depth":23,"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"I'll add audio data syncing and log copying to the script. Let me first check the current script structure:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Read","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 3s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"I'll add audio data syncing and log copying to the script. Let me make the necessary changes:","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Error while editing","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thoughts","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"+","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"2","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SRC_MEETINGS","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$(","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$DB_SRC","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"SELECT COUNT(*) FROM meetings WHERE date(meeting_start) = '","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$TARGET_DATE","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"';\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SRC_ACHUNKS","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$(","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$DB_SRC","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"SELECT COUNT(*) FROM audio_chunks WHERE date(timestamp) = '","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$TARGET_DATE","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"';\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"SRC_ATRANS","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"=","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$(","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"sqlite3","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$DB_SRC","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"SELECT COUNT(*) FROM audio_transcriptions WHERE audio_chunk_id IN (SELECT id FROM audio_chunks WHERE date(timestamp) = '","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$TARGET_DATE","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"');\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":")","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Thought","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"for 1s","depth":21,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"screenpipe_sync.sh","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"+","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"5","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"-","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"3","depth":23,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"printf","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\" %-20s %s\\n\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"audio_transcriptions:\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"$SRC_ATRANS","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"if","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"[","depth":24,"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"\"","depth":24,"on_screen":true,"role_description":"text"}]...
|
-2303672677952648048
|
9211583603618071197
|
visual_change
|
accessibility
|
NULL
|
Explorer (⌥⌘E)
Search (⇧⌘F)
Source Control (⇧⌘ Explorer (⌥⌘E)
Search (⇧⌘F)
Source Control (⇧⌘G) - 19 pending changes
19
Codemaps
DeepWiki
Run and Debug
Remote Explorer
Extensions (⇧⌘X)
Explorer
Explorer
Explorer Section: screenpipe [SSH: nas]
Explorer Section: screenpipe [SSH: nas]
screenpipe [SSH: nas]
#recycle
app
data
lib
screenpipe_sync_db.sh
U
screenpipe_sync_files.sh
U
screenpipe_sync_helpers.sh
U
logs
pipes
.gitignore
app_settings.json
archive.db
archive.db-bak
U
archive.db.bak-pre-installid
U
db.sqlite
db.sqlite-shm
db.sqlite-wal
screenpipe_fts_migrate.sh
U
screenpipe_sync copy.sh
U
screenpipe_sync_old.sh
U
screenpipe_sync_updated.sh
U
screenpipe_sync.sh
M
screenpipe.db
Outline Section
Outline
Outline
Timeline Section
Timeline
Timeline
remote SSH: nas
SSH: nas
screenpipe (Git) - master*, Checkout Branch/Tag...
master*
screenpipe (Git) - Synchronize Changes
No Problems
0
0
Forwarded Ports: 41257, 36613, 33153, 45375
4
Notifications
Windsurf - Settings
Teams, Daily: 0% quota used · Weekly: 68% quota used
Shell Script
LF
UTF-8
Spaces: 2
Ln 7, Col 1 (100 selected)
Screen Reader Optimized
Info: Setting up SSH Host (details): Creating local forwarding server...
Clear
Refactor Sync Script
15h
2h
15m
2m
1m
Show More
Thought
for 1s
The
ocr_text
table has no
id
column - its primary key is
frame_id
. The script incorrectly assumes all tables have an
id
column when creating the unique index.
The fix is in
screenpipe_sync.sh
around line 363-370. The loop needs to check if the table has an
id
column before creating the index:
bash
# Re-run install_id + index setup so freshly-created tables get them too.
for
tbl
in
"${
ALL_SYNC_TABLES
[
@
]}"
;
do
HAS_COL
=
$(
sqlite3
"
$NAS_DB
"
"SELECT COUNT(*) FROM pragma_table_info('
$tbl
') WHERE name='install_id';"
)
if
[
"
$HAS_COL
"
=
"0"
];
then
sqlite3
"
$NAS_DB
"
"ALTER TABLE
$tbl
ADD COLUMN install_id TEXT;"
fi
# Check if table has 'id' column before creating index
HAS_ID
=
$(
sqlite3
"
$NAS_DB
"
"SELECT COUNT(*) FROM pragma_table_info('
$tbl
') WHERE name='id';"
)
if
[
"
$HAS_ID
"
-gt
"0"
];
then
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, id);"
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
done
The same fix applies to
screenpipe_fts_migrate.sh
at line 106. Both scripts have the same bug.
You'll need to switch to Code mode to apply these changes.
Feedback submitted
fix it
Expand content to scroll
Thoughts
I'll fix both scripts to handle tables without an
id
column.
Read
screenpipe_sync.sh and screenpipe_fts_migrate.sh
Thought
for 1s
screenpipe_sync.sh
+
8
-
1
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
done
screenpipe_fts_migrate.sh
+
8
-
1
else
# For tables without 'id', use rowid instead
sqlite3
"
$NAS_DB
"
"CREATE UNIQUE INDEX IF NOT EXISTS idx_${
tbl
}_install_pk ON ${
tbl
}(install_id, rowid);"
fi
fi
Thoughts
Fixed both scripts. They now check if a table has an
id
column before creating the unique index. If the table lacks an
id
column (like
ocr_text
), it uses
rowid
instead.
Changes made:
screenpipe_sync.sh
(line 369-375): Added check for
id
column, fallback to
rowid
screenpipe_fts_migrate.sh
(line 106-113): Added check for
id
column, fallback to
rowid
You can now retry running the sync script from your Mac.
Feedback submitted
lukas@Lukas-Kovaliks-MacBook-Pro-Jiminny ~/.screenpipe $ ~/.screenpipe/screenpipe_sync.sh 2026-05-11
[2026-05-12 20:58:45] ========================================
[2026-05-12 20:58:45] Screenpipe sync starting for: 2026-05-11
[2026-05-12 20:58:45] ========================================
[+00m00s] ▶ Preflight checks
Source DB: OK (4.2G)
NAS mount: OK /Volumes/screenpipe
Archive DB: will be created
Data dir: OK (283 files, 318M)
[+00m00s] ▶ Counting source rows for 2026-05-11
frames: 6857
elements: 672129
ui_events: 7063
ocr_text: 2332
meetings: 1
[+00m00s] ▶ Initialising tables (CREATE IF NOT EXISTS)
creating tables ✓ 0m00s
[+00m00s] ▶ Reconciling NAS schema with source
schema: video_chunks ✓ in sync
schema: frames ✓ in sync
schema: elements ✓ in sync
schema: ocr_text ✓ in sync
schema: ui_events ✓ in sync
schema: meetings ✓ in sync
creating indexes ✓ 0m01s
creating FTS tables ✓ 0m00s
[+00m02s] ▶ Syncing data for 2026-05-11
video_chunks ✓ 0m00s
frames (6857 rows) ✓ 2m37s
ocr_text (2332 rows) ⠋ Parse error near line 3: ambiguous column name: app_name
acted_at") SELECT "frame_id","text","text_json","app_name","ocr_engine","win
error here ---^
Expand content to scroll
Thought
for 3s
The error occurs because the
ocr_text
INSERT uses a JOIN with
frames
, and both tables have an
app_name
column. The script uses unqualified column names, causing SQLite to be ambiguous about which table's
app_name
to select.
The script's comment assumes no column name collisions, but
app_name
exists in both tables. The fix is to qualify all columns in the SELECT with the table alias
o.
.
Let me check the current INSERT statement:
Error while viewing
screenpipe_sync.sh
Thoughts
Read
screenpipe_sync.sh
#L320-334
Thoughts
Searched
ocr_text.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 2s
Searched
INSERT.*ocr_text in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 1s
screenpipe_sync.sh
+
2
-
1
INSERT OR IGNORE INTO nas.ocr_text
SELECT o.* FROM main.ocr_text o
SELECT o.frame_id, o.text, o.text_json, o.app_name, o.ocr_engine, o.window_name, o.focused, o.text_length, o.sync_id, o.synced_at, o.redacted_at
FROM main.ocr_text o
JOIN main.frames f ON o.frame_id = f.id
Thoughts
Fixed the
ocr_text
INSERT by explicitly listing columns instead of using
o.*
, which caused the ambiguous column name error with
app_name
existing in both
ocr_text
and
frames
.
Let me also check the
elements
table INSERT since it also uses a JOIN with frames and might have the same issue:
Searched
elements.*INSERT in /volume1/screenpipe (screenpipe_sync.sh)
Thought
for 1s
The script you're running is
~/.screenpipe/screenpipe_sync.sh
on your Mac, but I've been editing
screenpipe_sync.sh
on the NAS. These are different files.
You need to either:
Copy the fixed script from NAS to Mac, or
Apply the same fix to your Mac's script
The fix for the
ocr_text
ambiguous column error is to change:
bash
SELECT
o.
*
FROM
main.ocr_text
o
to:
bash
SELECT
o.frame_id,
o.text,
o.text_json,
o.app_name,
o.ocr_engine,
o.window_name,
o.focused,
o.text_length,
o.sync_id,
o.synced_at,
o.redacted_at
FROM
main.ocr_text
o
Which script are you actually running on your Mac - the one in
~/.screenpipe/
?
Feedback submitted
Now add also audio data. CREATE TABLE audio_chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL,
timestamp TIMESTAMP,
sync_id TEXT,
machine_id TEXT,
synced_at DATETIME,
evicted_at TIMESTAMP DEFAULT NULL
)
CREATE TABLE "audio_transcriptions" (
id INTEGER PRIMARY KEY AUTOINCREMENT,
audio_chunk_id INTEGER NOT NULL,
offset_index INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL,
transcription TEXT NOT NULL,
device TEXT NOT NULL DEFAULT '',
is_input_device BOOLEAN NOT NULL DEFAULT TRUE,
speaker_id INTEGER,
transcription_engine TEXT NOT NULL DEFAULT 'Whisper',
start_time REAL,
end_time REAL,
text_length INTEGER,
sync_id TEXT,
synced_at DATETIME,
redacted_at INTEGER,
FOREIGN KEY (audio_chunk_id) REFERENCES audio_chunks(id)
) CREATE VIRTUAL TABLE audio_transcriptions_fts USING fts5 (
transcription,
device,
speaker_id,
content='audio_transcriptions',
content_rowid='id',
tokenize='unicode61'
) This is still not included in the script (mac to nas) but it exists on original app (mac). Add it to the script. Also move logs (~/.screenpipe/screenpipe.2026-05-09.0.log) to /logs/screenpipe.2026-05-09.0.log on nas.
Expand content to scroll
Thought
for 1s
I'll add audio data syncing and log copying to the script. Let me first check the current script structure:
Read
screenpipe_sync.sh
Thought
for 3s
I'll add audio data syncing and log copying to the script. Let me make the necessary changes:
Error while editing
screenpipe_sync.sh
Thoughts
screenpipe_sync.sh
+
2
SRC_MEETINGS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM meetings WHERE date(meeting_start) = '
$TARGET_DATE
';"
)
SRC_ACHUNKS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM audio_chunks WHERE date(timestamp) = '
$TARGET_DATE
';"
)
SRC_ATRANS
=
$(
sqlite3
"
$DB_SRC
"
"SELECT COUNT(*) FROM audio_transcriptions WHERE audio_chunk_id IN (SELECT id FROM audio_chunks WHERE date(timestamp) = '
$TARGET_DATE
');"
)
Thought
for 1s
screenpipe_sync.sh
+
5
-
3
printf
" %-20s %s\n"
"audio_transcriptions:"
"
$SRC_ATRANS
"
if
[
"...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
56314
|
1963
|
8
|
2026-05-19T07:54:27.249113+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-19/1779 /Users/lukas/.screenpipe/data/data/2026-05-19/1779177267249_m2.jpg...
|
PhpStorm
|
|
1
|
NULL
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
All Places
Preview
Filter
Open in Find Tool Window All Places
Preview
Filter
Open in Find Tool Window
AS
AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class
AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class
AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class
AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class
AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class
ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class
AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI
AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI
Assert.php vendor/webmozart/assert/src/Assert.php, class
Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class
Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class
AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI
AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary
… more
AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class
AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class
AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class
AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class
AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class
ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class
AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI
AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI
Assert.php vendor/webmozart/assert/src/Assert.php, class
Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class
Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class
AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI
AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary
… more
Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php
Open In Right Split...
|
[{"role":"AXButton","text" [{"role":"AXButton","text":"All Places","depth":2,"bounds":{"left":0.7027925,"top":0.24181964,"width":0.029587766,"height":0.01915403},"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Preview","depth":2,"bounds":{"left":0.73238033,"top":0.24181964,"width":0.008643617,"height":0.01915403},"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Filter","depth":2,"bounds":{"left":0.74102396,"top":0.24181964,"width":0.008643617,"height":0.01915403},"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open in Find Tool Window","depth":2,"bounds":{"left":0.7496675,"top":0.24181964,"width":0.008643617,"height":0.01915403},"on_screen":true,"role_description":"button","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXTextField","text":"AS","depth":1,"bounds":{"left":0.50232714,"top":0.27214685,"width":0.25598404,"height":0.023144454},"on_screen":true,"value":"AS","role_description":"text field","is_enabled":true,"is_focused":true,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.30407023,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class","depth":2,"bounds":{"left":0.4993351,"top":0.3216281,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class","depth":2,"bounds":{"left":0.4993351,"top":0.33918595,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class","depth":2,"bounds":{"left":0.4993351,"top":0.3567438,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.37430167,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.39185953,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.4094174,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.42697525,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI","depth":2,"bounds":{"left":0.4993351,"top":0.4445331,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI","depth":2,"bounds":{"left":0.4993351,"top":0.46209097,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/webmozart/assert/src/Assert.php, class","depth":2,"bounds":{"left":0.4993351,"top":0.47964883,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class","depth":2,"bounds":{"left":0.4993351,"top":0.49720672,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class","depth":2,"bounds":{"left":0.4993351,"top":0.51476455,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI","depth":2,"bounds":{"left":0.4993351,"top":0.5323224,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary","depth":2,"bounds":{"left":0.4993351,"top":0.54988027,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"… more","depth":2,"bounds":{"left":0.4993351,"top":0.5674381,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.30407023,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class","depth":4,"bounds":{"left":0.4993351,"top":0.3216281,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class","depth":4,"bounds":{"left":0.4993351,"top":0.33918595,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class","depth":4,"bounds":{"left":0.4993351,"top":0.3567438,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.37430167,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.39185953,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.4094174,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.42697525,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI","depth":4,"bounds":{"left":0.4993351,"top":0.4445331,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI","depth":4,"bounds":{"left":0.4993351,"top":0.46209097,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/webmozart/assert/src/Assert.php, class","depth":4,"bounds":{"left":0.4993351,"top":0.47964883,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class","depth":4,"bounds":{"left":0.4993351,"top":0.49720672,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class","depth":4,"bounds":{"left":0.4993351,"top":0.51476455,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI","depth":4,"bounds":{"left":0.4993351,"top":0.5323224,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary","depth":4,"bounds":{"left":0.4993351,"top":0.54988027,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"… more","depth":4,"bounds":{"left":0.4993351,"top":0.5674381,"width":0.26196808,"height":0.017557861},"on_screen":true,"role_description":"text"},{"role":"AXStaticText","text":"Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php","depth":1,"bounds":{"left":0.50598407,"top":0.75259376,"width":0.14793883,"height":0.013567438},"on_screen":true,"help_text":"Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php","role_description":"text"},{"role":"AXLink","text":"Open In Right Split","depth":1,"bounds":{"left":0.71476066,"top":0.75259376,"width":0.03856383,"height":0.013567438},"on_screen":true,"role_description":"link","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false}]...
|
-3371425163448320290
|
9211519533236984121
|
visual_change
|
accessibility
|
NULL
|
All Places
Preview
Filter
Open in Find Tool Window All Places
Preview
Filter
Open in Find Tool Window
AS
AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class
AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class
AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class
AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class
AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class
ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class
AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI
AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI
Assert.php vendor/webmozart/assert/src/Assert.php, class
Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class
Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class
AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI
AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary
… more
AssemblyAI.php .../app/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php, class
AssemblyAI.php .../app/Component/Transcription/DTO/Http/Transcription/AssemblyAI/AssemblyAI.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAITest.php, final class
AssemblyAITest.php .../tests/Unit/Component/Transcription/DTO/Http/Transcript/AssemblyAI/AssemblyAITest.php, final class
AssemblyAiSummaryCreatedListenerTest.php .../tests/.../Crm/Summary/AssemblyAiSummaryCreatedListenerTest.php, class
AcquireSsrsLeaseContext.php vendor/google/apiclient-services/src/SQLAdmin/AcquireSsrsLeaseContext.php, class
AssertableJson.php vendor/laravel/framework/src/Illuminate/Testing/Fluent/AssertableJson.php, class
ArSsTest.php vendor/nesbot/carbon/tests/Localization/ArSsTest.php, class
AssemblyAI.php.html build/coverage/Component/Transcription/TranscriptionProcessor/AssemblyAI
AssemblyAI.php.html build/coverage/Component/Transcription/DTO/Http/Transcription/AssemblyAI
Assert.php vendor/webmozart/assert/src/Assert.php, class
Assert.php vendor/phpunit/phpunit/src/Framework/Assert.php, abstract class
Assert.php vendor/laravel/framework/src/Illuminate/Testing/Assert.php, abstract class
AssemblyAiFilter.php.html build/coverage/Component/AssemblyAI
AssemblyAiSummaryCreatedListener.php.html build/coverage/Listeners/Activities/Crm/Summary
… more
Component/Transcription/TranscriptionProcessor/AssemblyAI/AssemblyAI.php
Open In Right Split...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
38871
|
1440
|
14
|
2026-05-14T06:30:08.074405+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740208074_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"bounds":{"left":0.011469414,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.020113032,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-4875849404314601076
|
9210195691199191703
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
30948
|
1219
|
10
|
2026-05-13T08:12:00.587081+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778659920587_m1.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-8372074269719931801
|
9210195553223236247
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue....
|
30946
|
NULL
|
NULL
|
NULL
|
|
30797
|
1215
|
26
|
2026-05-13T08:03:16.050946+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778659396050_m1.jpg...
|
Firefox
|
DXP4800PLUS-B5F8 — Personal
|
1
|
nas.lakylak.xyz/desktop/#/
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive....
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"bounds":{"left":0.0,"top":0.0055555557,"width":0.090277776,"height":0.022777777},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
7381601577196835231
|
9210195551075818135
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Close tab
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
Edit
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive....
|
30795
|
NULL
|
NULL
|
NULL
|
|
38802
|
1439
|
74
|
2026-05-14T06:29:12.188636+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740152188_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6427856706452513825
|
9210195551075715735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."...
|
38800
|
NULL
|
NULL
|
NULL
|
|
38806
|
1439
|
76
|
2026-05-14T06:29:13.351440+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740153351_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
6427856706452513825
|
9210195551075715735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."...
|
38804
|
NULL
|
NULL
|
NULL
|
|
39106
|
1442
|
81
|
2026-05-14T06:36:06.582355+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740566582_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
4732658358071765459
|
9210195551075715735
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
32035
|
1241
|
45
|
2026-05-13T09:03:54.191879+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663034191_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
12 May 10:12 · Firefox / planhat - Google Search — Work
⏮ 30s
◀ 10s
⏸ Pause
10s ▶...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"bounds":{"left":0.16023937,"top":0.30207503,"width":0.13696809,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.3028731,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"bounds":{"left":0.16023937,"top":0.30367118,"width":0.13663563,"height":0.13128492},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"bounds":{"left":0.29720744,"top":0.30207503,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.42976856,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.43216282,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.4736632,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.47565842,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.025930852,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.106715426,"top":0.47805268,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.23005319,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.1341423,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"bounds":{"left":0.21492687,"top":0.528332,"width":0.053025264,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.22623006,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"bounds":{"left":0.0787899,"top":0.5885874,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"bounds":{"left":0.0787899,"top":0.59018356,"width":0.08194814,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"bounds":{"left":0.0787899,"top":0.61652035,"width":0.20761304,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"bounds":{"left":0.24883644,"top":0.63846767,"width":0.025099734,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"bounds":{"left":0.0787899,"top":0.63727057,"width":0.22240691,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"bounds":{"left":0.105053194,"top":0.65802073,"width":0.11419548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.21924867,"top":0.65802073,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"bounds":{"left":0.0787899,"top":0.6875499,"width":0.22755983,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.27293882,"top":0.7094972,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"bounds":{"left":0.0787899,"top":0.70830005,"width":0.23354389,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"bounds":{"left":0.0787899,"top":0.811253,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"bounds":{"left":0.0787899,"top":0.81284916,"width":0.09840426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"bounds":{"left":0.0787899,"top":0.83918595,"width":0.23254654,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.117519945,"top":0.90263367,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"bounds":{"left":0.0787899,"top":0.90143657,"width":0.23038563,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"bounds":{"left":0.0787899,"top":0.98363924,"width":0.234375,"height":0.01636076},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"bounds":{"left":0.0787899,"top":0.98523545,"width":0.10405585,"height":0.014764547},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.2209109,"height":-0.011572242},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.14660904,"top":1.0,"width":0.011303191,"height":-0.05426979},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.23121676,"height":-0.05307257},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 763MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.06067154,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 763MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03158245,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39594415,"top":0.059856344,"width":0.024933511,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42154256,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44581118,"top":0.059856344,"width":0.020944148,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.46742022,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.5031583,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53357714,"top":0.059856344,"width":0.034075797,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN","depth":10,"bounds":{"left":0.45761302,"top":0.14964086,"width":0.10571808,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"−","depth":9,"bounds":{"left":0.813996,"top":0.1452514,"width":0.009807181,"height":0.018754989},"on_screen":true,"help_text":"Zoom out","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"1×","depth":10,"bounds":{"left":0.82795876,"top":0.14924182,"width":0.004155585,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"+","depth":9,"bounds":{"left":0.83643615,"top":0.1452514,"width":0.009640957,"height":0.018754989},"on_screen":true,"help_text":"Zoom in","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Follow","depth":10,"bounds":{"left":0.8494016,"top":0.14924182,"width":0.004654255,"height":0.011173184},"on_screen":true,"help_text":"","role_description":"checkbox","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Follow","depth":10,"bounds":{"left":0.85538566,"top":0.14924182,"width":0.011136968,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:30","depth":13,"bounds":{"left":0.45894283,"top":0.21947326,"width":0.00880984,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:00","depth":13,"bounds":{"left":0.47573137,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:30","depth":13,"bounds":{"left":0.4921875,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:00","depth":13,"bounds":{"left":0.50880986,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:30","depth":13,"bounds":{"left":0.52526593,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:00","depth":13,"bounds":{"left":0.5415558,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:30","depth":13,"bounds":{"left":0.55801195,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:00","depth":13,"bounds":{"left":0.57430184,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:30","depth":13,"bounds":{"left":0.59075797,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:00","depth":13,"bounds":{"left":0.6072141,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:30","depth":13,"bounds":{"left":0.6236702,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:00","depth":13,"bounds":{"left":0.64012635,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:30","depth":13,"bounds":{"left":0.6565825,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:00","depth":13,"bounds":{"left":0.67303854,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:30","depth":13,"bounds":{"left":0.68949467,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:00","depth":13,"bounds":{"left":0.70611703,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:30","depth":13,"bounds":{"left":0.72257316,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"18:00","depth":13,"bounds":{"left":0.73886305,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"18:30","depth":13,"bounds":{"left":0.7553192,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"19:00","depth":13,"bounds":{"left":0.77177525,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"19:30","depth":13,"bounds":{"left":0.7882314,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"20:00","depth":13,"bounds":{"left":0.804355,"top":0.21947326,"width":0.008643617,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"20:30","depth":13,"bounds":{"left":0.82081115,"top":0.21947326,"width":0.008643617,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"21:00","depth":13,"bounds":{"left":0.83759975,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"21:30","depth":13,"bounds":{"left":0.8540558,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12 May 10:12 · Firefox / planhat - Google Search — Work","depth":10,"bounds":{"left":0.45628324,"top":0.2661612,"width":0.10488697,"height":0.011971269},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"⏮ 30s","depth":9,"bounds":{"left":0.45694813,"top":0.7186752,"width":0.023936171,"height":0.02434158},"on_screen":true,"help_text":"Ctrl+←","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"◀ 10s","depth":9,"bounds":{"left":0.48354387,"top":0.71907425,"width":0.02244016,"height":0.023942538},"on_screen":true,"help_text":"←","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"⏸ Pause","depth":9,"bounds":{"left":0.5086436,"top":0.7186752,"width":0.027925532,"height":0.02434158},"on_screen":true,"help_text":"Space","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"10s ▶","depth":9,"bounds":{"left":0.53922874,"top":0.71907425,"width":0.022273935,"height":0.023942538},"on_screen":true,"help_text":"→","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
-335103884459386528
|
9209073691035053021
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
12 May 10:12 · Firefox / planhat - Google Search — Work
⏮ 30s
◀ 10s
⏸ Pause
10s ▶...
|
32033
|
NULL
|
NULL
|
NULL
|
|
38648
|
1435
|
9
|
2026-05-14T06:25:30.850714+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778739930850_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXButton","text":"Mute tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.0,"top":0.0,"width":0.134375,"height":0.020555556},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
164333506082916558
|
9209073691026664413
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Mute tab
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini...
|
38647
|
NULL
|
NULL
|
NULL
|
|
32004
|
1241
|
31
|
2026-05-13T09:03:32.434897+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663012434_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"bounds":{"left":0.16023937,"top":0.30207503,"width":0.13696809,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.3028731,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"bounds":{"left":0.16023937,"top":0.30367118,"width":0.13663563,"height":0.13128492},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"bounds":{"left":0.29720744,"top":0.30207503,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.42976856,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.43216282,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.4736632,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.47565842,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.025930852,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.106715426,"top":0.47805268,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.23005319,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.1341423,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"bounds":{"left":0.21492687,"top":0.528332,"width":0.053025264,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.22623006,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"bounds":{"left":0.0787899,"top":0.5885874,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"bounds":{"left":0.0787899,"top":0.59018356,"width":0.08194814,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"bounds":{"left":0.0787899,"top":0.61652035,"width":0.20761304,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"bounds":{"left":0.24883644,"top":0.63846767,"width":0.025099734,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"bounds":{"left":0.0787899,"top":0.63727057,"width":0.22240691,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"bounds":{"left":0.105053194,"top":0.65802073,"width":0.11419548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.21924867,"top":0.65802073,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"bounds":{"left":0.0787899,"top":0.6875499,"width":0.22755983,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.27293882,"top":0.7094972,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"bounds":{"left":0.0787899,"top":0.70830005,"width":0.23354389,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"bounds":{"left":0.0787899,"top":0.811253,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"bounds":{"left":0.0787899,"top":0.81284916,"width":0.09840426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"bounds":{"left":0.0787899,"top":0.83918595,"width":0.23254654,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.117519945,"top":0.90263367,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"bounds":{"left":0.0787899,"top":0.90143657,"width":0.23038563,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"bounds":{"left":0.0787899,"top":0.98363924,"width":0.234375,"height":0.01636076},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"bounds":{"left":0.0787899,"top":0.98523545,"width":0.10405585,"height":0.014764547},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.2209109,"height":-0.011572242},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.14660904,"top":1.0,"width":0.011303191,"height":-0.05426979},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.23121676,"height":-0.05307257},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 763MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.06067154,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 763MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03158245,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39594415,"top":0.059856344,"width":0.024933511,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42154256,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44581118,"top":0.059856344,"width":0.020944148,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.46742022,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.5031583,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53357714,"top":0.059856344,"width":0.034075797,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN","depth":10,"bounds":{"left":0.45761302,"top":0.14964086,"width":0.10571808,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"−","depth":9,"bounds":{"left":0.813996,"top":0.1452514,"width":0.009807181,"height":0.018754989},"on_screen":true,"help_text":"Zoom out","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"1×","depth":10,"bounds":{"left":0.82795876,"top":0.14924182,"width":0.004155585,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"+","depth":9,"bounds":{"left":0.83643615,"top":0.1452514,"width":0.009640957,"height":0.018754989},"on_screen":true,"help_text":"Zoom in","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Follow","depth":10,"bounds":{"left":0.8494016,"top":0.14924182,"width":0.004654255,"height":0.011173184},"on_screen":true,"help_text":"","role_description":"checkbox","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Follow","depth":10,"bounds":{"left":0.85538566,"top":0.14924182,"width":0.011136968,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:30","depth":13,"bounds":{"left":0.45910904,"top":0.21947326,"width":0.008643617,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:00","depth":13,"bounds":{"left":0.47573137,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:30","depth":13,"bounds":{"left":0.4921875,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-8375902077149695819
|
9209073141279239133
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30...
|
32003
|
NULL
|
NULL
|
NULL
|
|
32054
|
1241
|
55
|
2026-05-13T09:04:24.570649+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663064570_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"bounds":{"left":0.16023937,"top":0.30207503,"width":0.13696809,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.3028731,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"bounds":{"left":0.16023937,"top":0.30367118,"width":0.13663563,"height":0.13128492},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"bounds":{"left":0.29720744,"top":0.30207503,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.42976856,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.43216282,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.4736632,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.47565842,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.025930852,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.106715426,"top":0.47805268,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.23005319,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.1341423,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"bounds":{"left":0.21492687,"top":0.528332,"width":0.053025264,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.22623006,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"bounds":{"left":0.0787899,"top":0.5885874,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"bounds":{"left":0.0787899,"top":0.59018356,"width":0.08194814,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"bounds":{"left":0.0787899,"top":0.61652035,"width":0.20761304,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"bounds":{"left":0.24883644,"top":0.63846767,"width":0.025099734,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"bounds":{"left":0.0787899,"top":0.63727057,"width":0.22240691,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"bounds":{"left":0.105053194,"top":0.65802073,"width":0.11419548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.21924867,"top":0.65802073,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"bounds":{"left":0.0787899,"top":0.6875499,"width":0.22755983,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.27293882,"top":0.7094972,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"bounds":{"left":0.0787899,"top":0.70830005,"width":0.23354389,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"bounds":{"left":0.0787899,"top":0.811253,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"bounds":{"left":0.0787899,"top":0.81284916,"width":0.09840426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"bounds":{"left":0.0787899,"top":0.83918595,"width":0.23254654,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.117519945,"top":0.90263367,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"bounds":{"left":0.0787899,"top":0.90143657,"width":0.23038563,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"bounds":{"left":0.0787899,"top":0.98363924,"width":0.234375,"height":0.01636076},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"bounds":{"left":0.0787899,"top":0.98523545,"width":0.10405585,"height":0.014764547},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.2209109,"height":-0.011572242},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.14660904,"top":1.0,"width":0.011303191,"height":-0.05426979},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.23121676,"height":-0.05307257},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 763MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.06067154,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 763MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03158245,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39594415,"top":0.059856344,"width":0.024933511,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42154256,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44581118,"top":0.059856344,"width":0.020944148,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.46742022,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.5031583,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53357714,"top":0.059856344,"width":0.034075797,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN","depth":10,"bounds":{"left":0.45761302,"top":0.14964086,"width":0.10571808,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"−","depth":9,"bounds":{"left":0.813996,"top":0.1452514,"width":0.009807181,"height":0.018754989},"on_screen":true,"help_text":"Zoom out","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"1×","depth":10,"bounds":{"left":0.82795876,"top":0.14924182,"width":0.004155585,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"+","depth":9,"bounds":{"left":0.83643615,"top":0.1452514,"width":0.009640957,"height":0.018754989},"on_screen":true,"help_text":"Zoom in","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Follow","depth":10,"bounds":{"left":0.8494016,"top":0.14924182,"width":0.004654255,"height":0.011173184},"on_screen":true,"help_text":"","role_description":"checkbox","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Follow","depth":10,"bounds":{"left":0.85538566,"top":0.14924182,"width":0.011136968,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:30","depth":13,"bounds":{"left":0.45894283,"top":0.21947326,"width":0.00880984,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:00","depth":13,"bounds":{"left":0.47573137,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:30","depth":13,"bounds":{"left":0.4921875,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:00","depth":13,"bounds":{"left":0.50880986,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:30","depth":13,"bounds":{"left":0.52526593,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:00","depth":13,"bounds":{"left":0.5415558,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:30","depth":13,"bounds":{"left":0.55801195,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:00","depth":13,"bounds":{"left":0.57430184,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:30","depth":13,"bounds":{"left":0.59075797,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:00","depth":13,"bounds":{"left":0.6072141,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:30","depth":13,"bounds":{"left":0.6236702,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:00","depth":13,"bounds":{"left":0.64012635,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:30","depth":13,"bounds":{"left":0.6565825,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:00","depth":13,"bounds":{"left":0.67303854,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:30","depth":13,"bounds":{"left":0.68949467,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:00","depth":13,"bounds":{"left":0.70611703,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:30","depth":13,"bounds":{"left":0.72257316,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
428773019499955289
|
9209073141279239133
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30...
|
32053
|
NULL
|
NULL
|
NULL
|
|
32055
|
1241
|
56
|
2026-05-13T09:04:27.606130+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663067606_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"bounds":{"left":0.16023937,"top":0.30207503,"width":0.13696809,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.3028731,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"bounds":{"left":0.16023937,"top":0.30367118,"width":0.13663563,"height":0.13128492},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"bounds":{"left":0.29720744,"top":0.30207503,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.42976856,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.43216282,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.4736632,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.47565842,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.025930852,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.106715426,"top":0.47805268,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.23005319,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.1341423,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"bounds":{"left":0.21492687,"top":0.528332,"width":0.053025264,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.22623006,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"bounds":{"left":0.0787899,"top":0.5885874,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"bounds":{"left":0.0787899,"top":0.59018356,"width":0.08194814,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"bounds":{"left":0.0787899,"top":0.61652035,"width":0.20761304,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"bounds":{"left":0.24883644,"top":0.63846767,"width":0.025099734,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"bounds":{"left":0.0787899,"top":0.63727057,"width":0.22240691,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"bounds":{"left":0.105053194,"top":0.65802073,"width":0.11419548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.21924867,"top":0.65802073,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"bounds":{"left":0.0787899,"top":0.6875499,"width":0.22755983,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.27293882,"top":0.7094972,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"bounds":{"left":0.0787899,"top":0.70830005,"width":0.23354389,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"bounds":{"left":0.0787899,"top":0.811253,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"bounds":{"left":0.0787899,"top":0.81284916,"width":0.09840426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"bounds":{"left":0.0787899,"top":0.83918595,"width":0.23254654,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.117519945,"top":0.90263367,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"bounds":{"left":0.0787899,"top":0.90143657,"width":0.23038563,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"bounds":{"left":0.0787899,"top":0.98363924,"width":0.234375,"height":0.01636076},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"bounds":{"left":0.0787899,"top":0.98523545,"width":0.10405585,"height":0.014764547},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.2209109,"height":-0.011572242},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.14660904,"top":1.0,"width":0.011303191,"height":-0.05426979},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.23121676,"height":-0.05307257},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 763MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.06067154,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 763MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03158245,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39594415,"top":0.059856344,"width":0.024933511,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42154256,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44581118,"top":0.059856344,"width":0.020944148,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.46742022,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.5031583,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53357714,"top":0.059856344,"width":0.034075797,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN","depth":10,"bounds":{"left":0.45761302,"top":0.14964086,"width":0.10571808,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"−","depth":9,"bounds":{"left":0.813996,"top":0.1452514,"width":0.009807181,"height":0.018754989},"on_screen":true,"help_text":"Zoom out","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"1×","depth":10,"bounds":{"left":0.82795876,"top":0.14924182,"width":0.004155585,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"+","depth":9,"bounds":{"left":0.83643615,"top":0.1452514,"width":0.009640957,"height":0.018754989},"on_screen":true,"help_text":"Zoom in","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Follow","depth":10,"bounds":{"left":0.8494016,"top":0.14924182,"width":0.004654255,"height":0.011173184},"on_screen":true,"help_text":"","role_description":"checkbox","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Follow","depth":10,"bounds":{"left":0.85538566,"top":0.14924182,"width":0.011136968,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:30","depth":13,"bounds":{"left":0.45894283,"top":0.21947326,"width":0.00880984,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:00","depth":13,"bounds":{"left":0.47573137,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"10:30","depth":13,"bounds":{"left":0.4921875,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:00","depth":13,"bounds":{"left":0.50880986,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"11:30","depth":13,"bounds":{"left":0.52526593,"top":0.21947326,"width":0.0078125,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:00","depth":13,"bounds":{"left":0.5415558,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12:30","depth":13,"bounds":{"left":0.55801195,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:00","depth":13,"bounds":{"left":0.57430184,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"13:30","depth":13,"bounds":{"left":0.59075797,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:00","depth":13,"bounds":{"left":0.6072141,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"14:30","depth":13,"bounds":{"left":0.6236702,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:00","depth":13,"bounds":{"left":0.64012635,"top":0.21947326,"width":0.00831117,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"15:30","depth":13,"bounds":{"left":0.6565825,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:00","depth":13,"bounds":{"left":0.67303854,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"16:30","depth":13,"bounds":{"left":0.68949467,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:00","depth":13,"bounds":{"left":0.70611703,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"17:30","depth":13,"bounds":{"left":0.72257316,"top":0.21947326,"width":0.007978723,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"18:00","depth":13,"bounds":{"left":0.73886305,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"18:30","depth":13,"bounds":{"left":0.7553192,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"19:00","depth":13,"bounds":{"left":0.77177525,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"19:30","depth":13,"bounds":{"left":0.7882314,"top":0.21947326,"width":0.008144947,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"20:00","depth":13,"bounds":{"left":0.804355,"top":0.21947326,"width":0.008643617,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"20:30","depth":13,"bounds":{"left":0.82081115,"top":0.21947326,"width":0.008643617,"height":0.008778931},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-3134105690210933431
|
9209073141279239133
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30...
|
NULL
|
NULL
|
NULL
|
NULL
|
|
32061
|
1241
|
61
|
2026-05-13T09:04:39.724033+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-13/1778 /Users/lukas/.screenpipe/data/data/2026-05-13/1778663079724_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
AUDIO TRANSCRIPTIONS
09:28
MIC
That's it.
09:32
MIC
And now...
09:32
MIC
Thank you.
09:33
MIC
Beijing
09:33
MIC
causing concentreras.
09:35
MIC
you
09:40
MIC
I'm gonna use the handkerchief to make sure that the handkerchief is not too thick.
09:43
MIC
I'm not gonna say that. I'm gonna say that.
09:44
MIC
Thank you.
09:45...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"New Tab","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"New Tab","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.014960106,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.18994413,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.3463687,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.3575419,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.38068634,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.05817819,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"bounds":{"left":0.17885639,"top":0.0,"width":0.0034906915,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21010639,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.011303191,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.120678194,"top":0.0,"width":0.008144947,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"bounds":{"left":0.14960106,"top":0.0,"width":0.020944148,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.21542554,"height":0.037110932},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.028922873,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"bounds":{"left":0.12034574,"top":0.029928172,"width":0.106715426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"bounds":{"left":0.09142287,"top":0.029928172,"width":0.21991356,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"bounds":{"left":0.09142287,"top":0.050678372,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"bounds":{"left":0.0787899,"top":0.13288109,"width":0.22456782,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"bounds":{"left":0.0787899,"top":0.18715084,"width":0.036402926,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"bounds":{"left":0.091755316,"top":0.19592977,"width":0.017785905,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.23184358,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.14029256,"top":0.292498,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"bounds":{"left":0.16023937,"top":0.30207503,"width":0.13696809,"height":0.09577015},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.3028731,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"bounds":{"left":0.16023937,"top":0.30367118,"width":0.13663563,"height":0.13128492},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"bounds":{"left":0.29720744,"top":0.30207503,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.42976856,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.43216282,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.4736632,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.47565842,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.025930852,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.106715426,"top":0.47805268,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"bounds":{"left":0.0787899,"top":0.47685555,"width":0.23005319,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.1341423,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"bounds":{"left":0.21492687,"top":0.528332,"width":0.053025264,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"bounds":{"left":0.0787899,"top":0.5271349,"width":0.22623006,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"bounds":{"left":0.0787899,"top":0.5885874,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"bounds":{"left":0.0787899,"top":0.59018356,"width":0.08194814,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"bounds":{"left":0.0787899,"top":0.61652035,"width":0.20761304,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"bounds":{"left":0.24883644,"top":0.63846767,"width":0.025099734,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"bounds":{"left":0.0787899,"top":0.63727057,"width":0.22240691,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"bounds":{"left":0.105053194,"top":0.65802073,"width":0.11419548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.21924867,"top":0.65802073,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"bounds":{"left":0.0787899,"top":0.6875499,"width":0.22755983,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.27293882,"top":0.7094972,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"bounds":{"left":0.0787899,"top":0.70830005,"width":0.23354389,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"bounds":{"left":0.0787899,"top":0.811253,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"bounds":{"left":0.0787899,"top":0.81284916,"width":0.09840426,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"bounds":{"left":0.0787899,"top":0.83918595,"width":0.23254654,"height":0.07861133},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.117519945,"top":0.90263367,"width":0.011136968,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"bounds":{"left":0.0787899,"top":0.90143657,"width":0.23038563,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"bounds":{"left":0.0787899,"top":0.98363924,"width":0.234375,"height":0.01636076},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"bounds":{"left":0.0787899,"top":0.98523545,"width":0.10405585,"height":0.014764547},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.2209109,"height":-0.011572242},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"bounds":{"left":0.14660904,"top":1.0,"width":0.011303191,"height":-0.05426979},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"bounds":{"left":0.0787899,"top":1.0,"width":0.23121676,"height":-0.05307257},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 763MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061053474,"width":0.06067154,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06264964,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 763MB]","depth":9,"bounds":{"left":0.35970744,"top":0.066640064,"width":0.03158245,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.39594415,"top":0.059856344,"width":0.024933511,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42154256,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.44581118,"top":0.059856344,"width":0.020944148,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":true,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.46742022,"top":0.059856344,"width":0.03507314,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.5031583,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.53357714,"top":0.059856344,"width":0.034075797,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.06464485,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06424581,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06424581,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06424581,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06424581,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06424581,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.06464485,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"AUDIO TRANSCRIPTIONS","depth":9,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:28","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That's it.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:32","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"And now...","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:32","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Thank you.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:33","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Beijing","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:33","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"causing concentreras.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:35","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"you","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:40","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I'm gonna use the handkerchief to make sure that the handkerchief is not too thick.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:43","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I'm not gonna say that. I'm gonna say that.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:44","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"MIC","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Thank you.","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"09:45","depth":10,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-1699805007315567094
|
9209073141279239133
|
visual_change
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
New Tab
New Tab
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 763MB]
Screenpipe
[archive.db · 763MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
AUDIO TRANSCRIPTIONS
09:28
MIC
That's it.
09:32
MIC
And now...
09:32
MIC
Thank you.
09:33
MIC
Beijing
09:33
MIC
causing concentreras.
09:35
MIC
you
09:40
MIC
I'm gonna use the handkerchief to make sure that the handkerchief is not too thick.
09:43
MIC
I'm not gonna say that. I'm gonna say that.
09:44
MIC
Thank you.
09:45...
|
32060
|
NULL
|
NULL
|
NULL
|
|
39122
|
1444
|
0
|
2026-05-14T06:36:37.356381+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740597356_m2.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_2
|
NULL
|
NULL
|
NULL
|
NULL
|
Space
Screenpipe — Archive
Screenpipe — Archive
Al Space
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 3234.2MB]
Screenpipe
[archive.db · 3234.2MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
📅 Time Range Filter ▶
📅 Time Range Filter
▶
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow...
|
[{"role":"AXStaticText","text& [{"role":"AXStaticText","text":"Space","depth":2,"bounds":{"left":0.5262633,"top":0.9688747,"width":0.010804521,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.0518755,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.06304868,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"bounds":{"left":0.0,"top":0.08459697,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"bounds":{"left":0.013297873,"top":0.09577015,"width":0.029587766,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"bounds":{"left":0.0,"top":0.11731844,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"bounds":{"left":0.013297873,"top":0.12849163,"width":0.036901597,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"bounds":{"left":0.0,"top":0.15003991,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"bounds":{"left":0.013297873,"top":0.16121309,"width":0.037898935,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"bounds":{"left":0.05651596,"top":0.15722266,"width":0.007978723,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"bounds":{"left":0.0,"top":0.18276137,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"bounds":{"left":0.013297873,"top":0.19393456,"width":0.040724736,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"bounds":{"left":0.0,"top":0.21548285,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"bounds":{"left":0.013297873,"top":0.22665602,"width":0.03756649,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"bounds":{"left":0.0,"top":0.2482043,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"bounds":{"left":0.013297873,"top":0.25937748,"width":0.012134309,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"bounds":{"left":0.0,"top":0.28092578,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"bounds":{"left":0.013297873,"top":0.29209897,"width":0.1100399,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"bounds":{"left":0.0,"top":0.31364724,"width":0.06881649,"height":0.032721467},"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"bounds":{"left":0.013297873,"top":0.32482043,"width":0.05668218,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"bounds":{"left":0.0028257978,"top":0.34796488,"width":0.06333112,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"bounds":{"left":0.0028257978,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"bounds":{"left":0.013796543,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"bounds":{"left":0.024933511,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"bounds":{"left":0.036070477,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"bounds":{"left":0.04720745,"top":0.97007185,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"bounds":{"left":0.29321808,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"bounds":{"left":0.30518618,"top":0.055067837,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"bounds":{"left":0.3025266,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"bounds":{"left":0.07280585,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.08610372,"top":0.10454908,"width":0.028590426,"height":0.030327214},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"bounds":{"left":0.0887633,"top":0.10973663,"width":0.021941489,"height":0.020351157},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"bounds":{"left":0.2613032,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"bounds":{"left":0.27460107,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"bounds":{"left":0.28789893,"top":0.103751,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"bounds":{"left":0.068484046,"top":0.14764565,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"bounds":{"left":0.068484046,"top":0.15003991,"width":0.1200133,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.02642952,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22041224,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"bounds":{"left":0.18035239,"top":0.0,"width":0.015292553,"height":0.016360734},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"bounds":{"left":0.09142287,"top":0.0,"width":0.22174202,"height":0.057861134},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"bounds":{"left":0.075465426,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"bounds":{"left":0.08610372,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"bounds":{"left":0.09674202,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"bounds":{"left":0.107380316,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"bounds":{"left":0.11801862,"top":0.029928172,"width":0.010638298,"height":0.025538707},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"bounds":{"left":0.12566489,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"bounds":{"left":0.14029256,"top":0.0905826,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"bounds":{"left":0.16023937,"top":0.100159615,"width":0.15026596,"height":0.03830806},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"bounds":{"left":0.068484046,"top":0.1009577,"width":0.019946808,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"bounds":{"left":0.16023937,"top":0.10175578,"width":0.12849069,"height":0.035514764},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"bounds":{"left":0.3025266,"top":0.17039107,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"bounds":{"left":0.09208777,"top":0.17278531,"width":0.030917553,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"bounds":{"left":0.08976064,"top":0.21428572,"width":0.0003324468,"height":0.0007980846},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"bounds":{"left":0.08976064,"top":0.21628092,"width":0.04105718,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"bounds":{"left":0.0787899,"top":0.21747805,"width":0.23088431,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.10920878,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"bounds":{"left":0.18799867,"top":0.28850758,"width":0.06333112,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"bounds":{"left":0.0787899,"top":0.28850758,"width":0.20994017,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"bounds":{"left":0.0787899,"top":0.3499601,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"bounds":{"left":0.0787899,"top":0.35155627,"width":0.12549867,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.072972074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"bounds":{"left":0.15176196,"top":0.37789306,"width":0.07047872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"bounds":{"left":0.0787899,"top":0.37789306,"width":0.23321144,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"bounds":{"left":0.08510638,"top":0.39864326,"width":0.032247342,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"bounds":{"left":0.11735372,"top":0.39864326,"width":0.0013297872,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"bounds":{"left":0.0787899,"top":0.42817238,"width":0.23038563,"height":0.037110932},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.040724736,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"bounds":{"left":0.09142287,"top":0.47845173,"width":0.22174202,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.038896278,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"bounds":{"left":0.09142287,"top":0.5494813,"width":0.20744681,"height":0.09936153},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"bounds":{"left":0.0787899,"top":0.67318434,"width":0.234375,"height":0.01915403},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"bounds":{"left":0.0787899,"top":0.67478055,"width":0.08759973,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"bounds":{"left":0.0787899,"top":0.70111734,"width":0.2278923,"height":0.057861134},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"bounds":{"left":0.0787899,"top":0.7721468,"width":0.09275266,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"bounds":{"left":0.09142287,"top":0.801676,"width":0.07347074,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"bounds":{"left":0.09142287,"top":0.8312051,"width":0.038231384,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"bounds":{"left":0.12965426,"top":0.8312051,"width":0.014960106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.14461437,"top":0.8312051,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"bounds":{"left":0.15242687,"top":0.8312051,"width":0.041888297,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"bounds":{"left":0.19431517,"top":0.8312051,"width":0.02044548,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"bounds":{"left":0.09142287,"top":0.8607342,"width":0.030585106,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"bounds":{"left":0.12200798,"top":0.8607342,"width":0.04837101,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"bounds":{"left":0.17037898,"top":0.8607342,"width":0.0078125,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"bounds":{"left":0.17819148,"top":0.8607342,"width":0.061502658,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"bounds":{"left":0.23969415,"top":0.8607342,"width":0.027260639,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"bounds":{"left":0.09142287,"top":0.8902634,"width":0.20079787,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"bounds":{"left":0.0787899,"top":0.92378294,"width":0.116023935,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.15159574,"height":0.016360734},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.23238032,"top":0.94573027,"width":0.064328454,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"bounds":{"left":0.0787899,"top":0.9445331,"width":0.234375,"height":0.05546689},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.020777926,"height":-0.015562654},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.027925532,"height":-0.04509175},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"bounds":{"left":0.14012633,"top":1.0,"width":0.02244016,"height":-0.07462096},"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"bounds":{"left":0.08211436,"top":0.83439744,"width":0.22573139,"height":0.01915403},"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"bounds":{"left":0.08211436,"top":0.8347965,"width":0.030086435,"height":0.018355945},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"bounds":{"left":0.078125,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"bounds":{"left":0.094082445,"top":0.87031126,"width":0.013297873,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"bounds":{"left":0.27044547,"top":0.867917,"width":0.026097074,"height":0.031923383},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"bounds":{"left":0.2757646,"top":0.87669593,"width":0.007480053,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"bounds":{"left":0.29853722,"top":0.867917,"width":0.013297873,"height":0.031923383},"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"bounds":{"left":0.30485374,"top":0.8671189,"width":0.013962766,"height":0.033519555},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"bounds":{"left":0.11702128,"top":0.92178774,"width":0.11170213,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"bounds":{"left":0.2287234,"top":0.92178774,"width":0.044215426,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"bounds":{"left":0.068484046,"top":0.92098963,"width":0.043218084,"height":0.012370312},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"bounds":{"left":0.07413564,"top":0.95730245,"width":0.053523935,"height":0.025538707},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"bounds":{"left":0.07978723,"top":0.96249,"width":0.042220745,"height":0.015163607},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Screenpipe [archive.db · 3234.2MB]","depth":7,"bounds":{"left":0.33061835,"top":0.061452515,"width":0.06565824,"height":0.017956903},"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Screenpipe","depth":8,"bounds":{"left":0.33061835,"top":0.06304868,"width":0.027759308,"height":0.014764565},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"[archive.db · 3234.2MB]","depth":9,"bounds":{"left":0.35970744,"top":0.06703911,"width":0.03656915,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Activity","depth":7,"bounds":{"left":0.40093085,"top":0.059856344,"width":0.024767287,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Search","depth":7,"bounds":{"left":0.42636302,"top":0.059856344,"width":0.023603724,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Audio","depth":7,"bounds":{"left":0.45063165,"top":0.059856344,"width":0.021110373,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Work Report","depth":7,"bounds":{"left":0.47240692,"top":0.059856344,"width":0.034906916,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Timetable","depth":7,"bounds":{"left":0.50797874,"top":0.059856344,"width":0.029753989,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Summary","depth":7,"bounds":{"left":0.5383976,"top":0.059856344,"width":0.034242023,"height":0.0207502},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Date","depth":8,"bounds":{"left":0.93866354,"top":0.0650439,"width":0.008144947,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"12","depth":9,"bounds":{"left":0.95545214,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.96127,"top":0.06464485,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"05","depth":9,"bounds":{"left":0.9645944,"top":0.06464485,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"/","depth":8,"bounds":{"left":0.97041225,"top":0.06464485,"width":0.002493351,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2026","depth":9,"bounds":{"left":0.97390294,"top":0.06464485,"width":0.009474734,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Calendar","depth":8,"bounds":{"left":0.9847075,"top":0.0650439,"width":0.0051529254,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXStaticText","text":"Monitor","depth":9,"bounds":{"left":0.45262632,"top":0.10853951,"width":0.013464096,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Jump to","depth":9,"bounds":{"left":0.8111702,"top":0.10853951,"width":0.01412899,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.8312833,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":":","depth":9,"bounds":{"left":0.83710104,"top":0.10814046,"width":0.0023271276,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"--","depth":10,"bounds":{"left":0.84042555,"top":0.10814046,"width":0.0048204786,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Go","depth":8,"bounds":{"left":0.85920876,"top":0.10454908,"width":0.012300532,"height":0.018754989},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"📅 Time Range Filter ▶","depth":9,"bounds":{"left":0.45561835,"top":0.14205906,"width":0.41289893,"height":0.019553073},"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"📅 Time Range Filter","depth":10,"bounds":{"left":0.642121,"top":0.14604948,"width":0.036070477,"height":0.011572227},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"▶","depth":10,"bounds":{"left":0.67918885,"top":0.14604948,"width":0.0028257978,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN","depth":10,"bounds":{"left":0.45761302,"top":0.19313647,"width":0.10571808,"height":0.009976057},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"−","depth":9,"bounds":{"left":0.813996,"top":0.188747,"width":0.009807181,"height":0.018754989},"on_screen":true,"help_text":"Zoom out","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"1×","depth":10,"bounds":{"left":0.82795876,"top":0.19273743,"width":0.004155585,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"+","depth":9,"bounds":{"left":0.83643615,"top":0.188747,"width":0.009640957,"height":0.018754989},"on_screen":true,"help_text":"Zoom in","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Follow","depth":10,"bounds":{"left":0.8494016,"top":0.19273743,"width":0.004654255,"height":0.011173184},"on_screen":true,"help_text":"","role_description":"checkbox","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Follow","depth":10,"bounds":{"left":0.85538566,"top":0.19273743,"width":0.011136968,"height":0.010774142},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
146208447099203600
|
9209073141279239133
|
visual_change
|
accessibility
|
NULL
|
Space
Screenpipe — Archive
Screenpipe — Archive
Al Space
Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page
Screenpipe [archive.db · 3234.2MB]
Screenpipe
[archive.db · 3234.2MB]
Activity
Search
Audio
Work Report
Timetable
AI Summary
Date
12
/
05
/
2026
Calendar
Monitor
Jump to
--
:
--
Go
📅 Time Range Filter ▶
📅 Time Range Filter
▶
APP TIMELINE · CLICK TO PLAY · DRAG SCROLLBAR TO PAN
−
1×
+
Follow
Follow...
|
39121
|
NULL
|
NULL
|
NULL
|
|
38743
|
1437
|
33
|
2026-05-14T06:27:49.541988+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740069541_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.0,"top":0.0,"width":0.134375,"height":0.020555556},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false}]...
|
7252672520734540263
|
9209073003857062877
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window...
|
38740
|
NULL
|
NULL
|
NULL
|
|
39092
|
1441
|
82
|
2026-05-14T06:35:54.210048+00:00
|
/Users/lukas/.screenpipe/data/data/2026-05-14/1778 /Users/lukas/.screenpipe/data/data/2026-05-14/1778740554210_m1.jpg...
|
Firefox
|
Screenpipe — Archive — Personal
|
1
|
app.screenpipe.lakylak.xyz
|
monitor_1
|
NULL
|
NULL
|
NULL
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page...
|
[{"role":"AXRadioButton","text [{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"All docs · AFFiNE","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"All docs · AFFiNE","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"DXP4800PLUS-B5F8","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"DXP4800PLUS-B5F8","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Screenpipe — Archive","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":true},{"role":"AXStaticText","text":"Screenpipe — Archive","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Close tab","depth":5,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXRadioButton","text":"SQLite Web: archive.db","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: archive.db","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"SQLite Web: db.sqlite","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"SQLite Web: db.sqlite","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Claude","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Claude","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Manage extra usage for paid Claude plans | Claude Help Center","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXRadioButton","text":"2 TB in 25 MB/s - Google Search","depth":4,"on_screen":true,"help_text":"","role_description":"tab","subrole":"AXTabButton","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"2 TB in 25 MB/s - Google Search","depth":5,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New Tab","depth":4,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Customize sidebar","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Close Google Gemini (⌃X)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open history (⇧⌘H)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Open bookmarks (⌘B)","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bitwarden","depth":6,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"AI Chat settings","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Close","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Google Account: Lukáš Koválik (kovaliklukas@gmail.com)","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Main menu","depth":12,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"New chat","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share conversation","depth":11,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Open menu for conversation actions.","depth":12,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXHeading","text":"Conversation with Gemini","depth":15,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Conversation with Gemini","depth":16,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe is fundamentally designed to be a","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"100% local, privacy-first","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":27,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"1. Where It Starts: The Capture Layer","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. Where It Starts: The Capture Layer","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"System Audio:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Microphone Input:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Everything you say into your mic.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. How It Gets Transferred and Transcribed","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. How It Gets Transferred and Transcribed","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe prioritizes local processing, the \"transfer\" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is the transcription pipeline:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Engine:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The raw audio chunks are fed into a local speech-to-text model.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"OpenAI Whisper","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Diarization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As it transcribes the text, the engine also performs \"diarization\"—a technical term for speaker identification.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"View source details. Opens side panel.","depth":29,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Storage:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The final transcribed text is then indexed into a local","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"SQLite database","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":". This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The \"Work in Progress\" (WIP) Stage","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The \"Work in Progress\" (WIP) Stage","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"There is a slight bottleneck between","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"hearing","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the audio and","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"saving","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"the text, which acts as the WIP stage:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Buffering:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Processing Queue:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Finalization:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"4. Understanding the Folder Structure","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"4. Understanding the Folder Structure","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The SQLite Database (","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"):","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is \"done.\"","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folders:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Temp Files:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you see rapidly changing files, temporary","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"chunks, or locked database journals (like","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Sources","depth":26,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Sources","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":21,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?","depth":23,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Expand","depth":21,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Listen","depth":24,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Yes, those","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Here is exactly what those files are doing sitting in your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"directory and how they interact with the rest of the system.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"1. The \"Time Machine\" Playback","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"1. The \"Time Machine\" Playback","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"database along with","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"precise timestamps and file path references","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"2. Re-processing and Model Upgrades","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"2. Re-processing and Model Upgrades","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"3. The Source of Truth for Hallucinations","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"3. The Source of Truth for Hallucinations","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"file is your fallback source of truth to verify what was actually said.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Database Views Them","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Database Views Them","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you were to open up your","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"db.sqlite","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and query the tables (e.g.,","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"audio_transcriptions","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"), you would see that the database treats the","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/data/","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"soundcore AeroClip (input)_2026-05-12_07-40-48.mp4","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"What happens if you delete them?","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you manually","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"rm","depth":28,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a \"file not found\" error in the background logs.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Managing the Storage Footprint","depth":26,"on_screen":false,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Managing the Storage Footprint","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Because ScreenPipe records continuously, this folder will inevitably grow over time.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Built-in Garbage Collection:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Archiving:","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you prefer to keep a permanent, searchable \"life log\" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"older","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":24,"on_screen":true,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Share and export","depth":23,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":24,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":23,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy prompt","depth":21,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Edit","depth":21,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXHeading","text":"You said yes I will do that. Is there a way to setup languages to transcription?","depth":21,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You said","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes I will do that. Is there a way to setup languages to transcription?","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Listen","depth":24,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Show thinking","depth":28,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"Gemini said","depth":23,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Gemini said","depth":24,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"As for setting up languages for transcription,","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"yes, you can configure it","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":", but it helps to understand how ScreenPipe handles language natively first.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How the Transcription Engine Handles Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How the Transcription Engine Handles Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"By default, ScreenPipe uses a","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"multilingual Whisper model","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"and sets the language configuration to","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-Detect","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":".","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Advantage:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"The Drawback:","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXHeading","text":"How to Force a Specific Language","depth":26,"on_screen":true,"help_text":"","role_description":"heading","subrole":"AXUnknown"},{"role":"AXStaticText","text":"How to Force a Specific Language","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are using the Desktop App UI:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Open the ScreenPipe settings.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Navigate to the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Audio","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"AI/Transcription","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"section.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Look for the","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Whisper Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Transcription Language","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"dropdown.","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Change it from \"Auto\" to your specific language (e.g., English, Bulgarian, or Slovak).","depth":29,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you are running ScreenPipe via CLI/Config:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"You can modify your underlying configuration (usually found in","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"~/.screenpipe/pipe.json","depth":28,"bounds":{"left":0.0,"top":0.0,"width":0.134375,"height":0.020555556},"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:","depth":27,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Slovak)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(Bulgarian)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"(English)","depth":29,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.","depth":27,"on_screen":false,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Good response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXCheckBox","text":"Bad response","depth":23,"on_screen":false,"help_text":"","role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Redo","depth":23,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Share and export","depth":22,"on_screen":false,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Copy","depth":23,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Show more options","depth":22,"on_screen":false,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXTextArea","text":"Ask Gemini","depth":20,"on_screen":true,"value":"Ask Gemini","help_text":"","role_description":"text entry area","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Ask Gemini","depth":21,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Open upload file menu","depth":20,"on_screen":true,"role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Tools","depth":18,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false,"is_expanded":false},{"role":"AXButton","text":"Open mode picker","depth":20,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Pro","depth":23,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXCheckBox","text":"Microphone","depth":19,"on_screen":true,"role_description":"toggle button","subrole":"AXToggle","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXButton","text":"Send message","depth":19,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":false,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Gemini is AI and can make mistakes, including about people.","depth":17,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXLink","text":"Your privacy and Gemini Opens in a new window","depth":17,"on_screen":true,"help_text":"","role_description":"link","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Your privacy and Gemini","depth":18,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXStaticText","text":"Opens in a new window","depth":19,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"},{"role":"AXButton","text":"Summarize page","depth":7,"on_screen":true,"help_text":"","role_description":"button","subrole":"AXUnknown","is_enabled":true,"is_focused":false,"is_selected":false},{"role":"AXStaticText","text":"Summarize page","depth":9,"on_screen":true,"help_text":"","role_description":"text","subrole":"AXUnknown"}]...
|
-8123936687174540342
|
9209073003857062877
|
click
|
accessibility
|
NULL
|
Screenpipe — Archive
Screenpipe — Archive
All docs Screenpipe — Archive
Screenpipe — Archive
All docs · AFFiNE
All docs · AFFiNE
DXP4800PLUS-B5F8
DXP4800PLUS-B5F8
Screenpipe — Archive
Screenpipe — Archive
Close tab
SQLite Web: archive.db
SQLite Web: archive.db
SQLite Web: db.sqlite
SQLite Web: db.sqlite
Claude
Claude
Manage extra usage for paid Claude plans | Claude Help Center
Manage extra usage for paid Claude plans | Claude Help Center
2 TB in 25 MB/s - Google Search
2 TB in 25 MB/s - Google Search
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Open history (⇧⌘H)
Open bookmarks (⌘B)
Bitwarden
AI Chat settings
Close
Google Account: Lukáš Koválik ([EMAIL])
Main menu
New chat
Gemini
New chat
Share conversation
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
You said
Exaplin me: how does the ScreenPipe audio recording work? Where did it start? How is it getting transferred and transcribed? Is there some work in progress stage? How can I see from the folder structure what has been done and what hasn't been done?
Listen
Show thinking
Gemini said
Gemini said
Here is a breakdown of how ScreenPipe handles audio recording, from the moment it captures sound to how it organizes the data on your machine.
ScreenPipe is fundamentally designed to be a
100% local, privacy-first
application, meaning the vast majority of its processing happens directly on your hardware without relying on the cloud.
View source details. Opens side panel.
1. Where It Starts: The Capture Layer
1. Where It Starts: The Capture Layer
The process begins locally by tapping directly into your operating system’s audio routing. ScreenPipe simultaneously captures two separate audio streams:
System Audio:
Everything your computer outputs (e.g., the voices of other people in a Zoom or Teams meeting, a YouTube video, or Spotify).
Microphone Input:
Everything you say into your mic.
View source details. Opens side panel.
Because ScreenPipe acts as a 24/7 background process, it continuously listens to these streams and breaks the continuous audio into manageable chunks to prepare them for the next stage.
2. How It Gets Transferred and Transcribed
2. How It Gets Transferred and Transcribed
Because ScreenPipe prioritizes local processing, the "transfer" step is incredibly short. The audio data never leaves your machine unless you explicitly set up a cloud plugin.
Here is the transcription pipeline:
The Engine:
The raw audio chunks are fed into a local speech-to-text model.
By default, ScreenPipe uses
OpenAI Whisper
running locally on your hardware. (Users can also configure it to use cloud providers like Deepgram if they need faster processing, but local Whisper is the standard).
View source details. Opens side panel.
Diarization:
As it transcribes the text, the engine also performs "diarization"—a technical term for speaker identification.
It analyzes the audio to distinguish between your voice and the voices of others, labeling who said what.
View source details. Opens side panel.
Storage:
The final transcribed text is then indexed into a local
SQLite database
. This database uses Full-Text Search (FTS5), which is what allows you to instantly search for a phrase you heard in a meeting three weeks ago.
3. The "Work in Progress" (WIP) Stage
3. The "Work in Progress" (WIP) Stage
There is a slight bottleneck between
hearing
the audio and
saving
the text, which acts as the WIP stage:
Buffering:
Audio is recorded into a temporary buffer in your system's RAM or written as temporary chunk files on your disk.
Processing Queue:
Whisper requires CPU/GPU resources to transcribe. If you are having a rapid, multi-person conversation, the raw audio chunks line up in a processing queue.
Finalization:
Once the AI finishes transcribing a chunk, the text is committed to the SQLite database alongside a timestamp, and the raw audio is compressed and saved so you can play it back later.
4. Understanding the Folder Structure
4. Understanding the Folder Structure
All of ScreenPipe's data lives locally on your machine (usually in a hidden directory like
~/.screenpipe
on Mac/Linux or in your AppData folder on Windows). You can determine what has been processed by looking at a few key areas in this directory:
The SQLite Database (
):
This is the master ledger. If text exists inside this database, it means the audio has been fully transcribed, diarized, and is "done."
The
or
folders:
This is where the compressed, raw audio files and JPEG screenshots are stored permanently. Think of this as the raw archive.
Temp Files:
If you see rapidly changing files, temporary
chunks, or locked database journals (like
), this indicates the system is actively processing a backlog of audio (the WIP stage). Once the engine catches up, these temporary chunk files are usually cleared out or moved to permanent storage, and the database updates.
Are you setting up ScreenPipe for a specific workflow (like summarizing meetings or tracking your daily tasks), or just exploring how the architecture works under the hood?
Sources
Sources
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
You said I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
You said
I see the files in ~/.screenpipe/data/ as LakyLak bose qc35 II (input)_2026-05-12_06-49-17.mp4 or MacBook Pro Microphone (input)_2026-05-12_12-17-23.mp4 or soundcore AeroClip (input)_2026-05-12_07-40-48.mp4 or System Audio (output)_2026-05-11_06-17-14.mp4. Are these used for anything after transcribing and storing in sqlite db?
Expand
Listen
Show thinking
Gemini said
Gemini said
Yes, those
.mp4
files are absolutely still used after the transcription process is complete. They are not temporary processing chunks; they act as your permanent media archive.
Here is exactly what those files are doing sitting in your
~/.screenpipe/data/
directory and how they interact with the rest of the system.
1. The "Time Machine" Playback
1. The "Time Machine" Playback
The primary reason ScreenPipe keeps these files is for audio playback. When Whisper transcribes your meetings or ambient audio, it writes the text into the
db.sqlite
database along with
precise timestamps and file path references
.
When you search your ScreenPipe history for a specific conversation, the UI doesn't just show you the text—it uses the file path stored in the database to pull up that specific
.mp4
file and starts playing it at the exact second the keyword was spoken. Without these files, you get the transcript, but you lose the ability to actually listen to the context, tone, or nuance of the original conversation.
2. Re-processing and Model Upgrades
2. Re-processing and Model Upgrades
Local LLMs and transcription models are improving rapidly. Right now, your system might be using a quantized Whisper model to balance CPU load with accuracy. If you later decide to run a heavier, more accurate model—or if a radically better speaker diarization algorithm is released—having the raw
.mp4
files allows you to re-process historical audio. Without the raw input files, your data is permanently locked to the quality of the transcription model you were running on May 12th.
3. The Source of Truth for Hallucinations
3. The Source of Truth for Hallucinations
Speech-to-text engines, even great ones, occasionally hallucinate or misinterpret technical jargon, overlapping voices, or background noise. If a database query returns a completely nonsensical sentence, that
.mp4
file is your fallback source of truth to verify what was actually said.
How the Database Views Them
How the Database Views Them
If you were to open up your
db.sqlite
and query the tables (e.g.,
audio_transcriptions
), you would see that the database treats the
~/.screenpipe/data/
folder essentially as a blob storage backend. The tables contain columns acting as soft foreign keys pointing to filenames like
soundcore AeroClip (input)_2026-05-12_07-40-48.mp4
.
What happens if you delete them?
If you manually
rm
those files to free up space on your Mac's SSD, ScreenPipe won't crash. The SQLite database will still hold all your searchable text. However, any attempt to play back the audio for those older transcripts will silently fail or throw a "file not found" error in the background logs.
Managing the Storage Footprint
Managing the Storage Footprint
Because ScreenPipe records continuously, this folder will inevitably grow over time.
Built-in Garbage Collection:
ScreenPipe usually handles this automatically based on retention settings (e.g., deleting media older than 30 days or when the folder hits a specific GB limit).
Archiving:
If you prefer to keep a permanent, searchable "life log" without eating up your local SSD space, you don't necessarily have to let ScreenPipe delete them. You could script a simple cron job to regularly
older
files to your massive 4-drive RAID 5 array, replacing the local files with symlinks. This would keep the database references intact while offloading the heavy media storage to your NAS.
Good response
Bad response
Share and export
Copy
Show more options
Copy prompt
Edit
You said yes I will do that. Is there a way to setup languages to transcription?
You said
yes I will do that. Is there a way to setup languages to transcription?
Listen
Show thinking
Gemini said
Gemini said
That is a solid plan. Offloading those older media files to your 4-drive RAID 5 setup (or even the secondary SSD-only Beelink NAS if you end up building it) will keep your local SSD lean while keeping your database searchable.
As for setting up languages for transcription,
yes, you can configure it
, but it helps to understand how ScreenPipe handles language natively first.
How the Transcription Engine Handles Language
How the Transcription Engine Handles Language
By default, ScreenPipe uses a
multilingual Whisper model
and sets the language configuration to
Auto-Detect
.
When a new audio chunk comes in, the model listens to the first few seconds of audio, guesses the language, and transcribes the rest of that chunk using that assumption.
The Advantage:
If you are watching a tutorial in English, and then an hour later you are in a Zoom call speaking Bulgarian, ScreenPipe will automatically switch gears without you having to touch the settings.
The Drawback:
Auto-detect can sometimes guess wrong, especially if there is heavy background noise, if someone starts a sentence with a foreign loanword, or if you are rapidly code-switching between Slovak and English in the same conversation. When it guesses wrong, it might try to force Slovak words into English spelling, resulting in gibberish.
How to Force a Specific Language
How to Force a Specific Language
If you want to ensure maximum accuracy and you know you will primarily be recording in one language for a specific period, you can explicitly set the language. This also slightly speeds up the processing because the model skips the detection phase.
If you are using the Desktop App UI:
Open the ScreenPipe settings.
Navigate to the
Audio
or
AI/Transcription
section.
Look for the
Whisper Language
or
Transcription Language
dropdown.
Change it from "Auto" to your specific language (e.g., English, Bulgarian, or Slovak).
If you are running ScreenPipe via CLI/Config:
You can modify your underlying configuration (usually found in
~/.screenpipe/pipe.json
or by passing a flag when starting the daemon) to force a specific language code. You would use the standard ISO language codes:
(Slovak)
(Bulgarian)
(English)
If you find yourself frequently jumping between those three languages, your best bet is to leave it on Auto-Detect and perhaps run a slightly larger, more accurate Whisper model (like Whisper 'Small' or 'Base' instead of 'Tiny') if your hardware has the headroom for it.
Good response
Bad response
Redo
Share and export
Copy
Show more options
Ask Gemini
Ask Gemini
Open upload file menu
Tools
Open mode picker
Pro
Microphone
Send message
Gemini is AI and can make mistakes, including about people.
Your privacy and Gemini Opens in a new window
Your privacy and Gemini
Opens in a new window
Summarize page
Summarize page...
|
39088
|
NULL
|
NULL
|
NULL
|