
Real-Time Corporate Chat & Announcements
Real-time corporate messaging with group chats, polls, broadcast announcements and push notifications, embedded into the company's ERP.
Technical documentation
Overview
A real-time corporate messaging platform built as a module of Grupo Garza Limón's ERP. It is not a generic chat app: it lives inside the company ecosystem, reuses the ERP's identity, and embeds into its existing pages. It covers 1:1 and group conversations, media messages (image, audio, video, documents, PPTX), emoji reactions, replies/quotes, read receipts and typing indicators, plus WhatsApp-style polls, broadcast announcements and mobile push notifications.
Requirements
Functional: direct and group chat; media sharing; reactions, replies and read receipts; real-time presence (online, idle, offline, on-vacation); single- or multiple-choice polls; broadcast announcements (to everyone or targeted) with reactions and per-recipient read confirmation; push notifications to offline users; ERP-driven user activation/deactivation/reactivation; reconnection sync.
Non-functional: low-latency delivery (WebSocket), resilience to disconnects (cursor-based delta sync), guaranteed push delivery (persistent queue with retries and receipt verification), and authentication without managing its own credentials (the ERP issues the JWT). Constraints: on-premise deployment on the company's existing infrastructure (PM2, no containers), and a frontend that can be injected into legacy ERP pages with no build step.
Architecture
A modular NestJS backend organized by domain: auth, chat, comunicados (announcements), presence, gateway (WebSocket), aws and push. The HTTP request lifecycle is: request → AuthGuard (verifies the JWT from the header) → controller → service → TypeORM repository, with a @User() decorator that injects the JWT payload anywhere in the pipeline. To keep controllers clean, composite decorators (GetEndpoint, PostEndpoint, PatchEndpoint, DeleteEndpoint) bundle route, Swagger docs and auth into a single annotation.
The real-time layer uses Socket.IO with a room scheme: each user joins user:<idErp> and a chat:<chatId> room per conversation they belong to. When a chat's participants change, the GatewayService moves sockets between rooms. Outbound events (mensaje:nuevo, mensaje:leido, chat:escribiendo, usuario:estatus, encuesta:voto, comunicado:reaccion) and inbound ones (chat:typing, presence:status, chat:marcar_leido, chat:reaccionar) all ride a single connection authenticated by JWT in the handshake.
Fine-grained authorization is handled by a permissions guard that reads ERP permission codes embedded in the JWT (565 delete chats, 566 delete messages, 567 create announcements, 568 delete announcements). The backend manages no users or passwords: it acts as a resource server trusting the ERP as the token issuer (de-facto SSO), which greatly simplified the security model.
Database
PostgreSQL with TypeORM and versioned migrations (22 of them; synchronize: false, migrationsRun: true, so the schema updates itself on startup). Core model: usuarios, chats, usuarios_chat (membership + role), mensajes, estatus_mensajes (read receipts), reacciones_mensajes, the three poll tables (encuestas, encuestas_opciones, encuestas_votos), the announcement tables (comunicados, comunicado_destinatarios, comunicado_estatus, reacciones_comunicados), dispositivos_usuario (Expo push tokens) and queue_mensajes (push delivery queue).
A key modeling decision is the user's dual identity: the ERP identifies each employee with an idErp (string), stored as a unique external key, while internally an auto-increment numeric PK is used for TypeORM joins. Users are created lazily: the record is born on the first valid JWT connection. When an employee is deactivated, a soft-delete is applied and an @AfterLoad hook masks their name, email and image ("deactivated user"), so conversation history stays coherent without exposing the former employee's data. All relevant entities use soft-delete.
Technical decisions and trade-offs
The most interesting design decision was to model a poll as a message (TipoMensaje.ENCUESTA) rather than as a parallel entity. As a result, polls inherit, with no extra code, persistence, chronological ordering, cursor pagination, socket delivery (mensaje:nuevo), read receipts, quoting, soft-delete and delta sync. Poll-specific data lives in child tables and is serialized into the message. Voting uses set semantics (the client sends the full desired selection and the server reconciles it in a transaction), covering both vote-changing (single-choice) and toggling (multiple-choice) with one path.
By contrast, announcements are NOT messages: their delivery semantics differ (one-shot, per-recipient read status, not tied to a chat), so they were modeled separately. This asymmetry (reusing the message abstraction where it fits and splitting it where it doesn't) also shows up in push: messages and polls go through the persistent queue_mensajes queue (whose unique key is device+message), while reactions and announcements are sent directly via the Expo SDK, since they're ephemeral/one-shot and would collide with that constraint.
The frontend has two clients sharing a state layer (chat-store.js): the full chat app (chat.bundle.js, ~7,700 lines) in vanilla JavaScript and a React widget (~6,400 lines) meant to be injected into legacy ERP pages, loading React and Socket.IO from a CDN. Both bundles are IIFEs with no build step: they're edited directly. The trade-off is clear: deployment simplicity (just serve the file) at the cost of modern tooling, a constraint imposed by having to coexist with a legacy frontend.
Development, challenges and future work
The core challenges were resilience and consistency. To survive reconnections, a cursor-based delta sync was implemented: the client sends the last known message id per chat and gets back only what's new, avoiding full history reloads. Push delivery was solved with a cron every 5 seconds that processes the queue under a pessimistic lock to prevent double sends, a per-minute cron that verifies Expo receipts, and automatic cleanup of dead tokens (DeviceNotRegistered).
There is a known, documented wart: the user status field travels as "estado" over the REST API but as "estatus" over socket events and the presence endpoint, an inconsistency that demands care when touching either layer. As future work, the schema already reserves fields for anonymous, closed and expiring polls; and on scaling, since it currently runs as a single PM2 instance, scaling the WebSocket horizontally would require adding an adapter (e.g. Redis) for Socket.IO.
Infrastructure
On-premise deployment without containers: Node.js managed by PM2 (single instance, autorestart, 1 GB memory-restart limit, port 3000 in production). The code lives on a self-hosted GitLab (git.redgl.com); there is no CI/CD pipeline or Dockerfile (manual build and start: nest build → node dist/main). Media is stored in AWS S3 (uploads delete the previous file on replacement). Swagger is disabled in production. CORS is restricted to ERP origins. Push is delivered via Expo. The ERP is the main integrator: it issues the JWTs and calls internal endpoints to deactivate, reactivate and force vacation status for users.