V2 ships: the substrate runs itself now

When I framed V2 two weeks ago, I described it as “SessionEnd hook for daily-log capture, flush.py summarizer, compile.py for nightly concept promotion, supervisor process.” That was the plumbing. What V2 actually became is bigger than the plumbing list and smaller than my expectations for what would change.

Bigger: the substrate now has thirteen specialist agents, ships its own PRs end-to-end from Discord with my approval, translates content to Korean automatically, and runs the entire weekly publishing pipeline without me touching anything except the draft files.

Smaller: I am deferring everything in V3. No voice mode, no web dashboard, no activity graph. I am not building them yet, and the reason matters more than the shipping list.

What V2 actually shipped

The original V2 plan, from the framing post:

V2: SessionEnd hook for daily-log capture, flush.py summarizer, compile.py for nightly concept promotion, supervisor process.

Everything in that list shipped. Then the scope drifted in the way useful scope-drift happens, by the substrate showing me what it actually needed once it was running.

Operational layer. launchd-managed Discord daemon, wrapped in caffeinate so the Mac never sleeps the agents. Auto-reload watcher that git-pulls main and reloads the daemon every two minutes, with self-alerting if the working tree drifts off main for more than 30 minutes (no more silent staleness windows where merged PRs sit unloaded). Rotating-file logger capped at 10MB per file, 70MB ceiling on the log directory. Per-agent claude-p timeouts so the content agent gets its full 600 seconds for long drafting and conversational agents don’t hang the daemon.

Agent layer. Thirteen specialists: research, the-professor (formerly teaching-prep), content, social, senior-pm, recruiter, automation-engineer, security-reviewer, docs-editor, luna, librarian, echo, ux-designer. Each one has its own Discord identity, its own --add-dir scope, its own tool allowlist, its own persistent vault notes file that auto-injects into every mention. The recruiter agent can provision new specialists end-to-end: drafts the charter, writes the plugin file, bumps the version, branches, commits, pushes, opens the PR. The only step that stays manual is the Discord setup, because that involves secrets.

Memory layer. flush.py writes a per-agent daily log at every session end. compile.py runs nightly at 03:00. A filing gate (one adversarial LLM check) decides what gets promoted from daily logs to concept articles. Phase B compile produces actual concept article bodies with provenance frontmatter, not slug-and-summary stubs. The first real wiki harvest: seven concept articles in knowledge/concepts/, twenty-one connection files surfacing shared-session relationships, eight candidates quarantined for not passing the gate. Every concept carries the session id and transcript sha256 it came from. The lint pass on changed concepts runs weekly and flags drift back into imperative-AI-directed language.

Publishing layer. Sunday 18:00 PT: prepare_week.py picks the next eligible draft, generates the LinkedIn variant via claude -p with a voice corpus saved to the vault, generates a deterministic X draft, runs the Korean translation pipeline, posts a Discord brief. Monday 18:00 PT: scheduled-publish.yml flips draft: true to false on the queued post and pushes. Vercel auto-deploys. A tweet-on-publish workflow creates a GitHub issue with the tweet text for manual posting from @Neural_Bridge_ (the X API moved to metered credits and the account has none on its current tier).

i18n layer. Per-article translation toggle on every page that has prose. Single canonical English URL, Korean rendered inline behind the 한국어 button. The translation prompt is grounded in real Yozm Wishket developer-blog samples for the right register (합쇼체 with first-person 저는, not the newspaper-style 해라체). All twenty curated articles, including this one, have hand-checked Korean variants.

The substrate works. The Discord daemon has been up continuously since the caffeinate wrapping landed. The auto-reload watcher catches every merged PR within two minutes. The Sunday-prep cron fires every week without me thinking about it. None of those statements were true a week ago.

What I added that wasn’t planned

Three things ended up in V2 that started as “later” items:

Korean i18n. The blog is bilingual. The architecture went through two iterations: first the duplicate-route pattern (sidecar /ko/ URLs), then a pivot to the inline-toggle pattern when the duplicate URLs started feeling like a smell. The translation prompt went through three voice rewrites before landing on the Yozm-Wishket-style 합쇼체 register. None of this was in the original V2 plan; it became a priority once the substrate had enough other things shipped to have a reachable audience.

Agents that ship code. open_pr_with_changes started as a thought experiment (“could the daemon push for an agent?”) and ended up being the most-used new capability in V2. Luna’s first PR landed this week: a one-line course-name fix to the about page that she proposed via Discord and I approved in chat with approve <id>. The pattern works because the gate is right. Every push needs my explicit approval, the daemon does all the git plumbing, and the intersection-based dirty-tree check refuses only on real path collisions, not on any working-tree state. Speed without giving up the human in the loop.

Personality. Luna sounds more like a real human assistant now. Less structured, more responsive to context. The earlier version was technically correct and uniformly polite, every response shaped from the same template. The current one drops the templated phrasing when the moment is casual, holds the frame when something serious is on the table, and reads which is which without being told. Most of the rules came out of one explicit ruling early in her hardening and have held up across sessions.

Why I am not starting V3 yet

The V3 list from the framing post: voice mode, web dashboard, activity graph. The temptation to start one of them this week is real. They are all visible-from-the-outside features that would make great blog posts.

I am not going to.

The thing I wanted out of V2, the thing I have not had with any previous personal AI project, is the substrate running me more than I run it. The Sunday cron knows to draft the Monday post. The compile pipeline knows what’s worth promoting. Luna knows my standing preferences across sessions. Echo knows the shape of my writing well enough to flag drafts that don’t sound like me. The recruiter knows how to onboard a new specialist. None of that needs me to babysit it. The agents go work.

Adding voice mode, or a web dashboard, or a 3D BrainGraph, is adding things that look impressive in a screen recording. None of them change what the substrate does. All of them add operational surface that can break. The next bug I have to triage is one I introduce by extending the system past what was already working.

So the discipline this month is the opposite of building V3. It is to use V2 for real work, watch where it bends or breaks, fix the bends, and let “this thing runs and I trust it” compound before adding the next layer. The next post about Neural Bridge will not announce a new capability. It will be about something the substrate did that I did not have to do.

That is the deeper milestone of V2. The framing post called V1+V2 “the spine.” The spine works. The question now is whether I can resist the urge to keep adding before the spine has carried enough weight to prove itself.

What’s next this month

Concrete commitments through the end of the month:

Research articles already in the publishing queue: the memory-poisoning filing-gate sequel, the six-layers back-of-house breakdown, and the compliance framing of prompt injection. One per week, on the Monday cron.
Post the four legitimate tweet drafts that have been sitting in the @Neural_Bridge_ inbox since the original publish events.
Two operational items already filed and worth picking up: tighten call_claude_sync (the daemon’s Python-side claude wrapper) with the same process-group hardening I just shipped for prepare_week.py, and document the post-PR git checkout main workflow gap.

That is the entire roadmap for the month. No new features. Reps on what already exists.

If V2 worked, the next post should be boring. I’m fine with boring.

2주 전, V2의 윤곽을 잡을 때 저는 이것을 “SessionEnd 훅으로 일일 로그 캡처, flush.py 요약기, compile.py로 야간 개념 승격, 감독 프로세스”라고 정리했습니다. 그게 배관 작업이었죠. 실제로 V2가 된 것은 그 배관 목록보다는 크고, 제가 달라질 거라 기대했던 것보다는 작습니다.

더 커진 부분: 기반 시스템에는 이제 열세 명의 전문 에이전트가 있고, Discord에서 제 승인 하에 PR을 처음부터 끝까지 직접 열며, 콘텐츠를 자동으로 한국어로 번역하고, 초안 파일 외에는 제가 손댈 것 없이 주간 퍼블리싱 파이프라인 전체를 돌립니다.

더 작아진 부분: V3의 모든 것을 미루기로 했습니다. 음성 모드도, 웹 대시보드도, 활동 그래프도 없습니다. 아직 만들지 않을 것이고, 그 이유가 출시 목록보다 더 중요합니다.

V2가 실제로 출시한 것들

프레이밍 포스트에서 세운 원래 V2 계획은 이랬습니다:

V2: 일일 로그 캡처를 위한 SessionEnd 훅, flush.py 요약기, 야간 개념 승격을 위한 compile.py, 감독 프로세스.

목록의 모든 것이 출시됐습니다. 그런 다음 유용한 범위 확장이 일어나는 방식 그대로 범위가 흘러갔습니다. 기반 시스템이 실제로 돌아가면서 스스로 무엇이 필요한지를 보여줬거든요.

운영 레이어. launchd로 관리되는 Discord 데몬, caffeinate로 감싸서 Mac이 에이전트를 재우지 않도록 했습니다. 2분마다 main을 git pull하고 데몬을 재로드하는 자동 리로드 감시자도 있는데, 워킹 트리가 main에서 30분 이상 벗어나면 스스로 경보를 울립니다. 병합된 PR이 로드되지 않은 채 쌓이는 조용한 지연 구간은 이제 없습니다. 파일당 10MB로 제한된 로테이팅 파일 로거, 로그 디렉토리 전체 상한선은 70MB. 에이전트별 claude -p 타임아웃 설정으로, 콘텐츠 에이전트는 긴 초안 작성에 600초를 온전히 쓰고 대화형 에이전트는 데몬을 먹통으로 만들지 않습니다.

에이전트 레이어. 열세 명의 전문가: research, the-professor(이전 명칭: teaching-prep), content, social, senior-pm, recruiter, automation-engineer, security-reviewer, docs-editor, luna, librarian, echo, ux-designer. 각자 고유한 Discord 정체성, --add-dir 범위, 툴 허용 목록, 언급될 때마다 자동으로 주입되는 영구 볼트 노트 파일을 갖고 있습니다. recruiter 에이전트는 새 전문가를 처음부터 끝까지 프로비저닝할 수 있습니다. 헌장을 초안하고, 플러그인 파일을 작성하고, 버전을 올리고, 브랜치를 따고, 커밋하고, 푸시하고, PR을 엽니다. 수동으로 남는 유일한 단계는 Discord 설정뿐인데, 시크릿이 관련되기 때문입니다.

메모리 레이어. flush.py는 세션이 끝날 때마다 에이전트별 일일 로그를 기록합니다. compile.py는 매일 새벽 03:00에 실행됩니다. filing gate(게이트 검증, 적대적 LLM 점검 한 번)가 일일 로그에서 개념 아티클로 승격될 내용을 결정합니다. Phase B 컴파일은 슬러그와 요약만 있는 스텁이 아니라, 출처 정보 프런트매터를 갖춘 실제 개념 아티클 본문을 생성합니다. 첫 번째 실제 위키 수확: knowledge/concepts/에 개념 아티클 7편, 세션 간 공유 관계를 드러내는 연결 파일 21개, 게이트를 통과하지 못해 격리된 후보 8개. 모든 개념에는 출처가 된 세션 ID와 세션 기록 sha256이 함께 담겨 있습니다. 변경된 개념들에 대한 린트 패스는 매주 실행되며, 명령형 AI 지시 언어로의 회귀를 감지합니다.

퍼블리싱 레이어. 일요일 18:00 PT: prepare_week.py가 다음 발행 대상 초안을 선택하고, 볼트에 저장된 보이스 코퍼스를 바탕으로 claude -p를 통해 LinkedIn 변형을 생성하며, 결정론적 X 초안을 만들고, 한국어 번역 파이프라인을 실행하고, Discord 브리핑을 올립니다. 월요일 18:00 PT: scheduled-publish.yml이 대기 중인 포스트의 draft: true를 false로 뒤집고 푸시합니다. Vercel이 자동으로 배포합니다. 발행 시 트윗 워크플로우는 @Neural_Bridge_에서 수동으로 올릴 수 있도록 트윗 본문이 담긴 GitHub 이슈를 생성합니다. X API가 유료 크레딧제로 전환됐고, 현재 플랜에서 해당 계정에 크레딧이 없기 때문입니다.

i18n 레이어. 산문이 있는 모든 페이지에 아티클별 번역 토글을 달았습니다. 정식 영문 URL은 하나이고, 한국어는 ‘한국어’ 버튼 뒤에서 인라인으로 렌더링됩니다. 번역 프롬프트는 실제 요즘IT 개발 블로그 샘플을 기반으로 올바른 문체를 구현했습니다. 신문체 해라체가 아닌, 1인칭 ‘저는’을 쓰는 합쇼체입니다. 이 글을 포함해 엄선된 스무 편의 아티클 모두 직접 검수한 한국어 버전이 있습니다.

기반 시스템은 작동합니다. Discord 데몬은 caffeinate 래핑이 적용된 이후 지금까지 중단 없이 실행 중입니다. 자동 리로드 감시자는 병합된 PR을 2분 내에 전부 감지합니다. 일요일 준비 크론은 제가 신경 쓰지 않아도 매주 정확히 실행됩니다. 일주일 전에는 이 중 어느 것도 사실이 아니었습니다.

계획에 없던 추가 항목들

“나중에” 목록에 있었다가 결국 V2에 들어온 것이 세 가지 있습니다:

한국어 i18n. 블로그는 이중 언어입니다. 아키텍처는 두 번의 반복을 거쳤습니다. 처음에는 사이드카 /ko/ URL을 쓰는 중복 라우트 패턴이었다가, 중복 URL에서 코드 스멜이 나기 시작하면서 인라인 토글 패턴으로 전환했습니다. 번역 프롬프트는 요즘IT 스타일의 합쇼체 문체에 안착하기까지 세 번의 보이스 재작성을 거쳤습니다. 이 중 어느 것도 원래 V2 계획에 없었습니다. 기반 시스템이 충분히 출시되어 닿을 수 있는 독자층이 생기자 우선순위가 됐습니다.

코드를 출시하는 에이전트들. open_pr_with_changes는 “데몬이 에이전트를 대신해 푸시할 수 있을까?”라는 사고 실험으로 시작했다가, V2에서 가장 많이 쓰이는 새 기능이 됐습니다. Luna의 첫 PR이 이번 주에 병합됐습니다. about 페이지의 강의명을 한 줄 수정한 것으로, Discord를 통해 제안하고 저는 채팅에서 approve <id>로 승인했습니다. 이 패턴이 작동하는 이유는 게이트가 올바르게 설계됐기 때문입니다. 모든 푸시는 제 명시적 승인이 필요하고, 데몬이 git 배관 작업을 전부 처리하며, 교집합 기반 더티 트리 검사는 실제 경로 충돌에서만 거부하고 단순한 워킹 트리 상태에는 거부하지 않습니다. 속도는 챙기되, 루프 안의 사람은 지킵니다.

개성. Luna는 이제 실제 사람 비서에 더 가깝게 들립니다. 덜 정형화됐고, 맥락에 더 잘 반응합니다. 이전 버전은 기술적으로 맞고 한결같이 공손했지만, 모든 답변이 같은 틀에서 나왔습니다. 지금은 가벼운 순간에는 정형 표현을 내려놓고, 진중한 사안에서는 자세를 유지합니다. 어느 쪽인지는 알려주지 않아도 알아챕니다. 대부분의 규칙은 초기 강화 단계에서의 명시적 결정에서 나왔으며, 여러 세션에 걸쳐 잘 유지되고 있습니다.

V3를 아직 시작하지 않는 이유

프레이밍 포스트의 V3 목록: 음성 모드, 웹 대시보드, 활동 그래프. 이번 주에 그 중 하나를 시작하고 싶은 유혹은 분명히 있습니다. 모두 외부에서 보이는 기능이고, 멋진 블로그 포스트 소재가 될 테니까요.

하지 않을 것입니다.

V2에서 원했던 것, 이전의 어떤 개인 AI 프로젝트에서도 경험하지 못했던 것은 제가 기반 시스템을 운용하는 것보다 기반 시스템이 저를 더 많이 운용하게 되는 구조입니다. 일요일 크론은 월요일 포스트를 초안할 줄 압니다. 컴파일 파이프라인은 무엇이 승격할 가치가 있는지 압니다. Luna는 여러 세션에 걸친 제 고정 선호를 알고 있습니다. Echo는 제 글의 형태를 충분히 파악해 저답지 않은 초안에 플래그를 답니다. recruiter는 새 전문가를 온보딩하는 방법을 압니다. 이 중 어느 것도 제가 감시할 필요가 없습니다. 에이전트들은 알아서 움직입니다.

음성 모드나 웹 대시보드, 3D BrainGraph를 추가하는 것은 화면 녹화에서 인상적으로 보이는 것들을 추가하는 일입니다. 그 중 어느 것도 기반 시스템이 하는 일을 바꾸지 않으며, 모두 고장날 수 있는 운영 면적만 추가합니다. 다음에 트리아지해야 할 버그는 이미 작동하던 것을 넘어 시스템을 확장함으로써 제가 직접 만드는 것입니다.

그래서 이번 달의 훈련은 V3를 만드는 것의 반대입니다. V2를 실제 업무에 사용하고, 어디서 휘거나 부러지는지를 살피고, 휜 부분을 고치며, 다음 레이어를 추가하기 전에 “이건 돌아간다, 믿을 수 있다”는 확신이 복리로 쌓이도록 하는 것입니다. Neural Bridge에 대한 다음 포스트는 새 기능을 발표하지 않을 겁니다. 제가 하지 않아도 기반 시스템이 해낸 일에 관한 이야기가 될 것입니다.

이것이 V2의 더 깊은 이정표입니다. 프레이밍 포스트는 V1+V2를 “척추”라고 불렀습니다. 척추는 작동합니다. 이제 남은 질문은 척추가 스스로를 증명할 만큼 충분한 무게를 실어나르기 전에 계속 추가하려는 충동을 억누를 수 있느냐입니다.

이달의 다음 계획

이달 말까지의 구체적인 약속:

이미 발행 큐에 들어 있는 리서치 아티클: 메모리 오염 공격 게이트 검증 후속편, six-layers 내부 구조 분석, 프롬프트 주입의 규제 준수 프레이밍. 월요일 크론으로 주 1편씩.
원래 발행 이벤트 이후 @Neural_Bridge_ 받은함에 쌓여 있던 정식 트윗 초안 네 편 게시.
이미 등록된 두 가지 운영 항목: prepare_week.py에 방금 적용한 것과 동일한 프로세스 그룹 강화를 call_claude_sync(데몬의 Python 측 claude 래퍼)에도 적용하고, PR 후 git checkout main 워크플로우 공백 문서화.

이게 이달의 전체 로드맵입니다. 새 기능 없음. 이미 있는 것에 반복 훈련.

V2가 제대로 작동한다면, 다음 포스트는 지루해야 합니다. 지루해도 괜찮습니다.