Design Dropbox (Cloud File Storage)
Welcome to the definitive guide on designing a Cloud File Storage service (like Dropbox or Google Drive). Dropbox is a cloud-based service that allows users to store, share, and sync files across multiple devices securely and reliably.
In this guide, we will design this system from scratch, ensuring it can handle large files (up to 50 GB) and sync updates across devices with low latency.
Try This Problem Yourself
Practice with guided hints and real-time AI feedback.
Planning the Approach
Before moving on to designing the system, it is important to take a moment to plan your strategy. For product-design-style questions like Dropbox, the plan is straightforward:
- Build up your design sequentially by addressing the Functional Requirements one by one. This will help you stay focused and ensure you do not get lost in the weeds as you go.
- Rely on the Non-Functional Requirements to guide you through the architectural deep dives (e.g., handling scale, network interruptions, and latency).
[!NOTE] A Note on Blob Storage Design: Designing a Blob Storage system itself (the low-level object storage that stores raw bytes) is a related but distinct system design problem. For this interview guide, low-level Blob Storage is considered out of scope. We will use high-level object storage (like AWS S3) as a block in our design, but doing your own research on how systems like S3 are built is highly recommended.
Step 1: Requirements
Let's clearly define what is in-scope and out-of-scope for the design.
Functional Requirements
Core Requirements (In-Scope)
- Upload Files: Users should be able to upload a file from any device.
- Download Files: Users should be able to download a file from any device.
- Share Files: Users should be able to share a file with other users and view the list of files shared with them.
- Automatic Syncing: Users can automatically sync files across their registered devices.
Out of Scope
- Editing files directly within the application.
- Viewing files without downloading them (in-browser previews).
Non-Functional Requirements
Core Requirements (In-Scope)
- High Availability: The system should prioritize availability over consistency (eventual consistency).
- Large File Support: Support files up to 50 GB in size.
- Security & Reliability: Files must be securely stored and recoverable if they are lost or corrupted.
- Low Latency: Upload, download, and sync times must be as fast as possible.
Out of Scope
- Storage limits per user.
- File versioning and history.
- File scanning for viruses and malware.
The CAP Theorem Trade-off
Many candidates struggle with the CAP theorem trade-off for this question. Remember, you prioritize consistency over availability only if every read must receive the most recent write to prevent the system from breaking.
- Example of Strict Consistency: In a stock trading application, if a user buys a share of AAPL in Germany and another user immediately tries to buy that same share in the US, the first transaction must be replicated to the US before the second can proceed.
- Dropbox Scenario: For a file storage system like Dropbox, it is perfectly fine if a user in Germany uploads a file and a user in the US cannot see it for a few seconds. Thus, we prioritize High Availability (AP) and accept Eventual Consistency.
Step 2: Core Entities
Let's start with a broad overview of the primary entities. In the actual interview, this can be a simple, high-level list to ensure you and the interviewer are on the same page. We will focus on the schema database details when we have a clearer grasp of the system during high-level design.
1. File
The raw data and bytes that users upload, download, and share.
2. FileMetadata
Metadata associated with the file. At a minimum, this includes:
id(Unique Identifier, UUID)name(String, e.g.,report.pdf)size(Long integer, size in bytes)mimeType(String, e.g.,application/pdf)uploadedBy(User ID reference)
3. User
The user of our system:
id(Unique Identifier, UUID)email(String)createdAt(Timestamp)
Step 3: API Design
The API is the primary interface users will interact with. Note that your APIs may change or evolve as you progress through the interview. You can proactively communicate this to your interviewer:
"I am going to outline some simple APIs now, but I may come back and improve them as we delve deeper into the design."
All requests should pass authentication information (like session tokens or JWTs) in the HTTP Headers rather than the request body. This prevents clients from manipulating user identities and maintains security.
1. Upload a File
Initially, a simple endpoint might look like this:
Request Body:
(Note: As we will see in the deep dives, this simple upload API will evolve significantly to handle 50 GB files using chunking and presigned URLs).
2. Download a File
Response:
- Returns the raw file bytes along with file metadata.
3. Share a File
Request Body:
4. Fetch Changes (Syncing)
Enables clients to query for updates since their last successful sync:
Response Body:
Step 4: High-Level Design
To satisfy our functional requirements, we must design a system that handles both file content storage and metadata management.
Architecture Diagram Generation Prompt: Create a high-level system design diagram for Dropbox showing: Uploader/Downloader Clients, Load Balancer & API Gateway, File Service, Metadata DB (DynamoDB), Amazon S3, and CloudFront CDN. Highlight the separate flows for metadata control plane and data transfer plane.
1. File Metadata Database
Our metadata is loosely structured, has few relations, and its primary query pattern is fetching files by user. We can use a NoSQL database like DynamoDB (a fully managed NoSQL database by AWS).
Our DynamoDB schema starts as a simple document:
Interviewer Tip: Do not get too caught up in making the "perfect" database choice here. A SQL database like PostgreSQL would work just as well. What matters is explaining how the database is queried.
2. Direct Upload via Presigned URLs
If we upload files through our application servers, they will get clogged by high network I/O. Instead, we bypass the application server using the Handling Large Blobs pattern:
- Bypassing application servers for raw data transfer.
- Using securely signed URLs for direct S3 uploads.
- Implementing chunked uploads for reliability.
3. Sharing Files
To share a file, users enter the email of the target collaborator. We assume users are authenticated. The file service checks access permissions in a shared files table/cache before issuing download URLs.
4. Bi-directional Synchronization
To automatically sync files across devices, we must sync in two directions:
Local -> Remote (Upload Changes)
A client-side sync agent:
- Monitors the local Dropbox folder for changes using OS-specific file system events (e.g.,
FileSystemWatcheron Windows orFSEventson macOS). - Queues modified files locally.
- Uploads the changes to the server along with updated metadata.
- Resolves conflicts using a Last-Write-Wins (LWW) strategy (meaning the most recent edit overwrites previous versions). (Note: While versioning is out of scope, a production system would update a pointer/version index in metadata rather than overwriting raw data).
Remote -> Local (Download Changes)
Clients need to know when files change on the server. We have two options:
- Polling: The client periodically asks the server, "Has anything changed since timestamp X?" This is simple but slow and wastes network bandwidth.
- WebSocket / SSE (Server-Sent Events): The server maintains an open connection with each client to push updates in real-time. This is faster but complex to scale.
Our Hybrid Solution:
- Active Notifications: The client maintains a single WebSocket connection per device/session (not one per file). The server pushes real-time change notifications when files change.
- Periodic Polling Fallback: WebSockets can drop, and messages can be missed. As a safety net, the client periodically polls
GET /files/changes?since={timestamp}to catch missed updates and guarantee eventual consistency.
Key System Components
- Uploader Client: Monitors local folders, hashes changes, and handles direct uploads.
- Downloader Client: Detects remote changes via WebSocket/Polling and pulls them down.
- LB & API Gateway: Handles routing, SSL termination, rate limiting, and request validation.
- File Service: The metadata control plane. It reads/writes file metadata, checks sharing permissions, and generates presigned URLs without touching S3 directly (generating a presigned URL is a local cryptographic signature operation).
- Amazon S3: Stores the raw file blobs.
- CDN (CloudFront): Caches files close to edge locations. Downloads are served via signed CDN URLs to reduce latency.
Step 5: Deep Dives
To handle files up to 50 GB and make transfers as fast and secure as possible, we must optimize the architecture.
1. Handling Large Files (Resumable Chunked Uploads)
Uploading a 50 GB file in a single POST request is practically impossible due to:
- Timeouts: A 50 GB file on a 100 Mbps connection takes over 1.1 hours to upload (50 GB × 8 bits/byte ÷ 100 Mbps = 4,000 seconds). Servers and clients will time out long before this finishes.
- Gateway Limits: Modern API Gateways impose strict request body limits (e.g., Amazon API Gateway has a hard limit of 10 MB).
- Network Interruptions: If the upload drops at 49 GB, the user must restart from scratch.
- Poor User Experience: Users cannot see a progress indicator.
The Solution: Client-Side Chunking The client divides the file into 5-10 MB chunks before uploading them. We track progress for each chunk, enabling progress bars.
Managing Resumability with Fingerprints
To resume an upload, we need to know: (1) Have we attempted this file upload before? and (2) Which chunks are already uploaded?
We cannot rely on file names, which are not unique. Instead, we use a Fingerprint: a mathematical SHA-256 hash derived from the file content.
- Fingerprints identify content, not records. Two users uploading the same file produce the same fingerprint.
- We store this fingerprint in our
FileMetadatatable with a list of chunk states:
The Multipart Upload Protocol
- Initiate: Client hashes the file and requests a resumable upload. If it exists in DynamoDB, the client gets the list of uploaded chunks. If it's new, the server calls S3's
CreateMultipartUploadto get anuploadIdand saves the metadata asuploading. - Transfer: The server returns presigned S3 URLs for each missing part. The client uploads each chunk in parallel to S3 using the specific
uploadIdandpartNumber. - Verify & Patch: When a chunk completes, the client sends a
PATCHrequest with the chunk's S3 ETag. The backend verifies the chunk with S3'sListPartsAPI and marks the chunk status asuploadedin the DB. - Complete: Once all chunks are marked
uploaded, the backend calls S3'sCompleteMultipartUpload. S3 stiches the chunks together, and the DB status updates touploaded.
[!IMPORTANT] Interviewer Tip on AWS S3 APIs: With S3 multipart uploads, event notifications only trigger when the entire multipart upload is completed (when all parts are assembled), not for individual part uploads. For tracking individual part progress, you must use S3's
ListPartsAPI, which returns all uploaded parts with their ETags for an in-progress upload. In your interview, you cannot just say, "I'd use the S3 Multipart Upload API" without explaining how it works and how you would implement it yourself if you had to.
What about Downloads?
We do not need chunked downloads. Once S3 stitches the parts together, it acts as a single object. Clients download it via a single presigned/signed CDN URL. For large files, clients use HTTP Range Requests to download byte ranges in parallel or resume interrupted downloads without knowing the original chunk boundaries.
2. High-Performance Upload, Download, and Sync
To make the system as fast as possible, we apply three main optimizations:
Content-Defined Chunking (CDC)
With fixed-size chunking (e.g., exactly 5 MB), inserting a single byte at the start of a file shifts all subsequent chunk boundaries, giving every chunk a new fingerprint and making delta sync useless.
- The Solution: Content-Defined Chunking (CDC) uses a rolling hash (like Rabin Fingerprinting) to locate chunk boundaries dynamically based on content rather than fixed sizes.
- A small edit only shifts the immediate surrounding boundary. The remaining chunks keep their identical fingerprints, so only the edited chunk is uploaded. This is how Dropbox achieves efficient delta sync in practice.
Client-Side Compression
We compress text files (which compress well, e.g., a 5 GB text file down to 1 GB) but skip media files (like PNG/MP4, where compression yields no benefit and wastes CPU).
- We use fast, modern algorithms like Zstandard (zstd) or Brotli on the client before uploading.
- Security Ordering: Always compress before encrypting. Encryption introduces high randomness, which destroys the patterns needed for compression algorithms to work.
Parallel and Adaptive Transfers
We upload chunks in parallel and dynamically adjust chunk sizes (e.g., from 10 MB down to 2 MB) based on current network bandwidth and packet loss.
3. File Security and Access Control
- Encryption in Transit: All traffic between clients, API gateways, S3, and CloudFront is encrypted using HTTPS (TLS).
- Encryption at Rest: Files are encrypted at rest in S3 using unique KMS keys.
- CDN Signed URLs: Downloads are served via CloudFront Signed URLs. These are short-lived bearer tokens (e.g., 5-minute expiration) generated using a private key and verified by the CDN using a public key.
- Validation Flow:
- Generation: Server creates a signed URL with path, expiration timestamp, and optional client IP restrictions.
- Distribution: Authorized user receives the URL.
- Validation: The CDN verifies the signature using its public key. If valid and not expired, it serves the file from cache (or fetches from S3 on a miss). Otherwise, it rejects the request at the edge.
What is Expected at Each Level?
Mid-level (E4 / L4)
- Breadth over Depth: Focus on a functional end-to-end design (80% breadth, 20% depth). You should define clear APIs, a basic schema, and high-level flows.
- Probing the Basics: Be ready to explain what each component (like an API Gateway or Load Balancer) does. The interviewer will test if you understand the fundamentals.
- Collaborative Control: You should drive the requirements and initial design, but expect the interviewer to lead the deep dives.
- The Dropbox Bar: Land on a high-level design that works for uploading, downloading, and sharing. You are not expected to know AWS APIs (like S3 Multipart or CloudFront Signed URLs) off the top of your head, but you must be able to reason through chunking, resumability, and basic HTTP redirection (302 redirects) when prompted.
Senior / Staff (E5+ / L5+)
- Proactive Deep Dives: You should proactively identify issues (like body size limits on API Gateways, delta sync boundaries, or the security implications of bearer tokens).
- Architectural Depth: Demonstrate expert knowledge in Rabin fingerprinting, rolling hashes, delta sync, Zstandard trade-offs, and edge authorization.
- Driving the Interview: Own the conversation from start to finish, steering the discussion smoothly from requirements to low-level optimization without hand-holding.