Skip to main content
SearchLoginLogin or Signup

Complete Delete: In Practice, Clicking 'Delete' Rarely Deletes. Should it?

System designers must decide if they wish to implement systems that never delete anything, that delete local copies but leave remote backups, if they truly delete sensitive data everywhere, or if the actual deletion of individual files or photos will be left to chance

Published onSep 11, 2024
Complete Delete: In Practice, Clicking 'Delete' Rarely Deletes. Should it?
·
history

You're viewing an older Release (#2) of this Pub.

  • This Release (#2) was created on Sep 11, 2024 ()
  • The latest Release (#3) was created on Oct 15, 2024 ().

Abstract

Without explicit engineering to ensure complete deletion, many of the files, photographs, database records, and other information deleted by end-users can be readily recovered for an indeterminate period of time. In part, this is because each file is copied to multiple locations, and in part, this is because deleted copies are rarely overwritten on storage media, allowing their contents to be “undeleted” using digital forensics tools. This was not an issue in early computer systems, as they had limited storage and quickly overwrote unallocated storage blocks. Modern systems proactively make many copies of data to improve performance and provide for error recovery, forcing users and system designers to consider how deletion should work in principle. Cryptographic erasure makes deletion more predictable, making it possible to align the user experience of deleting and undeleting files with the actual impact on data erasure and recovery. Ultimately, system designers must decide if they wish to implement systems that never delete anything, that delete local copies but leave remote backups, if they truly delete sensitive data everywhere, or if the actual deletion of any individual file or photo will be left to chance.

🎧 Listen to an audio version of this case study.

Keywords: crypto shredding, cryptographic erasure, data governance, law enforcement, mass storage, operating systems, privacy, remnant data, sexting, system design, usability

Simson Garfinkel 
John A. Paulson School of Engineering and Applied Sciences, Harvard University

Learning Objectives

  • Discuss why the way that systems implement erasure is a policy option, and list possible policies.

  • Explain cryptographic erasure.

  • Discuss the conflict between the requirement to make data available and the requirement to completely delete it.

  • Diagram the movement of a digital photograph from its recording by a cell phone through distribution, storage, and ultimately its erasure.

  • Assemble a list of erasure policies in consumer computing systems as evidenced from documentation or user interfaces.

1. Introduction: Where Do Digital Files Live, and Where Do They Go When They Die?

Imagine that you are working on a research project about computer storage systems. Knowing about your interest, a friend sends you a link to the Wikipedia page describing the IBM 350 RAMAC, the world’s first commercial hard drive (Figure 1). The RAMAC stored 5 megabytes of data in total on fifty-two 24-inch diameter disks in a cabinet that was 60 inches long, 68 inches high, and 29 inches wide. IBM introduced the system in 1956, renting it to customers for $3,200 per month. The company withdrew the product in 1969.

Figure 1

The IBM 350 RAMAC at the Computer History Museum in Mountain View, California. Image source: Silicon Valley Sleuth blog, available via Wikimedia Commons.

You click on the link, and moments later your cell phone displays the RAMAC Wikipedia page and a photo of one of the remaining RAMAC storage mechanisms at the Computer History Museum. The photo was taken on September 12, 2006, with a Canon PowerShot A95, cropped, and uploaded to Wikipedia the following day. That file now sits on a server in a data center. When you clicked on the Wikipedia link, your phone contacted a computer operated by Wikipedia, downloaded a copy of the image, saved that copy to the phone’s flash storage, and finally displayed the image.

In the few tenths of a second that elapsed between the time that you clicked on the link and when the phone displayed the image, at least three complete copies of the RAMAC digital image were created in the world: one in the memory of the computer that sent you the image, a second in your cell phone’s memory, and a third in your phone’s browser ‘cache,’ that special place where your cell phone stored its own copy (see the Glossary). If you press your finger on the displayed image and then touch ‘save to photos,’ your phone will store another copy in your photo library, and yet another copy will probably be uploaded to another computer somewhere in the cloud.

This case study is about what happens to all those copies when they are no longer needed.

1.1. Characters, Bytes, and Blocks

Before the RAMAC, companies stored their data on punch cards—pieces of cardboard that measured  7⅜ inches by 3¼ inches and stored just eighty characters using codes involving tiny rectangular holes. See Figure 2. The US Social Security Administration had a card for each worker’s account; information could be added each year by punching the new data into the empty columns. Correcting data required creating an entirely new card and destroying the old. See Figure 2.

Figure 2

A deck of IBM punched cards, compiled circa 1969. Image source: Arnold Reinhold, available via Wikimedia Commons.

What made the RAMAC revolutionary was that it had ‘rewritable random-access storage’ (see the Glossary). Correcting data no longer required punching new cards: data could be rewritten in place. Data could also be updated. The auto manufacturer Chrysler took delivery of the first RAMAC in 1957 to manage its complex inventory and order processing system, which had previously resided on many decks of punched cards.

In computing, the word ‘storage’ is used to describe any kind of system that allows information to be stored (called ‘writing’) and recovered (called ‘reading’). ‘Random access’ means that the data can be written and read in any order, unlike a deck of punched cards that can only be read or written sequentially, one card at a time. Finally, the RAMAC’s magnetic storage was ‘rewritable,’ which meant that each of its five million characters of information could be rewritten any number of times. Those five million characters were arranged in 100-character chunks, called ‘records,’ in which each character had 7 bits of data and 1 bit for detecting errors. A single 100-character record was the smallest amount of data that the RAMAC could read or write at a time. Today, we call this the ‘block size’ (see the Glossary).

It’s easy to destroy data that’s on a punch card: burn it. Shredding also works, as does putting the card back into the puncher and punching all of the holes.

Physical destruction also destroys the data stored on the RAMAC, of course, but that would have been quite wasteful. Because the media is rewritable, the correct way to destroy data was to overwrite old data with new data. For example, if each block represents a person on a mailing list and that person asks to be removed, one way to satisfy the request would have been to overwrite their corresponding record with blanks. The next time the computer needed to store information about a person, it could find the first blank record and overwrite the blanks with data for the new person. This approach was reasonably fast, efficient, easy to understand, easy to program, and easy to verify. In fact, overwriting old data with new data is still a common way of updating information in modern computer systems.

1.2. Modern Files and File Systems

Things have changed a lot since the 1950s. Today’s computers use far more sophisticated approaches for error detection and correction, so modern storage systems store 8-bit bytes rather than 7-bit characters and a separate check bit. Renting storage has also grown cheaper: in August 2024, IBM’s ‘Free Tier’ provided users with 5 GB free per month in the IBM Cloud Object Storage system; each additional gigabyte cost roughly 2 cents/month—that’s 2 cents per month for 1,000 times more storage than the RAMDAC offered. Storage systems are also physically smaller: a micro SDXC memory card the size of an adult’s fingernail can hold 1 TB of data at a cost of only $60. Modern storage devices still store data in discrete chunks, although now they are called ‘blocks’ or ‘sectors’ and are typically 512 or 4,096 bytes long rather than 100.

However, even though modern computers store data in blocks, it’s now rare for either users or programmers to encounter them. Instead, users and programmers alike work with abstractions called ‘files.’ Conceptually, a file is a sequence of zero or more bytes. The digital image of the RAMAC 350 that we downloaded from Wikipedia is a file that is precisely 42,363 bytes long. Although user interfaces like the Windows File Explorer, the Macintosh Finder, and Google Drive typically show files having names and modification dates, this information is not part of the file. Instead, ‘metadata’ like file names and modification dates are stored separately in a place called a ‘directory’ or a ‘folder.’ Together, the collection of file contents, file names, directories, and other information is called a ‘file system.’

Figure 3

The mass storage system of a modern cell phone or laptop consists of flash memory that is divided into individual blocks, typically 4,096 bytes in size. Some blocks hold end-user data like digital photographs. Together, the blocks in a digital photograph create a file (in this case, IMG_0037.JPG). The file’s name, its creation date, and other information are stored in another group of blocks called a directory, which points to the file (and to other files that are in the same directory). That directory, in turn, is pointed to by another directory, which is pointed to by another. Blocks that do not hold any image are free for future use. The collection of all these blocks and their contents is called a file system. On many computers, files are deleted by removing their names from their containing directories and adding them to the free list, but the file contents are not overwritten, allowing them to be recovered using digital forensics tools.

Confusingly, the term file system is also used to describe the software within the computer’s operating system that is responsible for deciding the specific blocks on the mass storage device used to store the file’s contents. Those 42,363 bytes take up 11 blocks (assuming a 4,096-byte block size). Typically, those blocks are sequentially stored on the storage device, but this isn’t necessarily the case. The file system contains functions that allow programs to open files, read or write bytes to the file, close the file when done, and delete the file when it is no longer needed. We can differentiate these two uses of the term by referring the software as ‘the’ file system and the arrangement of data as ‘a’ file system.

1.3. Remnant Data and Cryptographic Erasure

Most of us have had the experience of taking a digital photograph that we didn’t like, or that we shouldn’t have taken, and immediately deleting the image. But what happens to the digital photograph’s data after you click “delete” depends on many factors. Here are two possibilities:

  1. If you took the photograph with a digital camera, the camera likely stored the digital photograph on a secure digital memory card (SD card). When you snapped the picture, the digital photograph was stored in a file, and the name of that file was stored in a directory. When you clicked delete, the name of the file was removed from the directory.

  2. If you took the photograph with an Apple iPhone using the built-in camera app, the digital photograph was stored in the phone’s nonvolatile flash memory in part of the file system used by Apple’s Photos app. At the same time, the phone started uploading the photo to Apple’s iCloud. When you clicked delete, the Photos app moved the photo to a special album called ‘recently deleted’ on both the phone and in iCloud. After 30 days, the photo will be removed from the ‘recently deleted’ album, and the actual files will be deleted from both the phone and from Apple’s servers.

‘Remnant data’ are data that remain after there has been a deletion attempt (see the Glossary). In the two examples above, the first deletion attempt results in remnant data, while the second does not. Here’s why:

  1. Most digital cameras use SD cards formatted using the Microsoft file allocation table (FAT) file system. As its name implies, FAT stores a table on the media that has the name of every file and where it is stored. It also labels every block on the card as to whether it is in a file or free to use. When a file is deleted, the file’s blocks are marked as ‘free,’ available for reuse, but they are not overwritten until new images are stored. Using special file recovery or digital forensics tools, it is possible to recover the digital images from the SD card if the individual blocks have not yet been overwritten.

  2. Apple’s iPhone uses Apple’s proprietary Apple File System (APFS). This file system encrypts every user file on the iPhone with a file-specific encryption key. APFS uses hardware built into the iPhone so that data blocks written to the flash memory are automatically encrypted, while data blocks read back are automatically decrypted. This happens without any impact on the user’s experience. When the file is deleted, the key is erased. Without the key, it is impossible to recover the unencrypted contents of the file. This process is called ‘cryptographic erasure’ or ‘crypto shredding.’ Many cloud providers use a similar process for securely erasing data from the cloud.

1.4. Automatic Backups and Time Machine Computing

Even with crypto shredding, it is still possible to have copies of that digital photo circulating after it was deleted. For example, there might be software running on the iPhone that automatically copies every photograph taken to a second cloud storage service, like Google Drive, Microsoft OneDrive, or Dropbox. Because these programs are frequently used to supplement the relatively small amount of storage on mobile phones, the services do not automatically erase photos from the cloud when they are erased from the phone.

The situation is potentially even more complicated if the iPhone user has a desktop computer that is also running an Apple operating system. If that computer is turned on and connected to the internet, it may automatically download a copy of the digital photograph when it transfers to the iCloud.

In October 2007, Apple introduced an easy-to-use backup system for home users called Time Machine. The backup system protects all files on the user’s computer, copying all new files to an external storage device every hour. Backups are stored until the external drive is filled; Time Machine then selectively deletes older files so that there is always a good selection of files that were recently changed as well as those that were deleted weeks or even months in the past.

Apple created Time Machine to solve an important problem for computer users: the accidental loss of important data. Many users fail to make backup copies of their data and consequentially lose their data if it is inadvertently deleted or if there is a hardware failure. With Time Machine, users do not need to think about backup strategies; everything gets backed up.

Now consider a photo that the iPhone snapped that is then copied to iCloud, copied to the user’s desktop computer, and finally written to a Time Machine backup. Even if this photo is deleted from the phone, iCloud, and the desktop, it will still be present on the backup for weeks or even months without the user’s knowledge. If the backup drive is disconnected and packed away in a storage locker, the digital photo might never be deleted.

2. Complete Delete

‘Complete Delete’ is a system design pattern that “ensure[s] that when the user deletes the visible representation of something, the hidden representations are deleted as well.”1

Several of the examples above either directly implement the Complete Delete pattern or could be readily modified to do so. For example, if the iPhone is not connected to the internet, a photo is deleted from the primary photo album, and then it is deleted from the ‘recently deleted’ photo album, that second delete is a Complete Delete; the cryptographic erasure ensures that there is no way to recover the photo from the iPhone’s physical storage.

Apple’s Time Machine gives users the ability to delete a file from all of the various snapshots stored on the backup volume, but this does not happen automatically—the user must explicitly delete a specific file or folder. Complicating matters, the deletion only happens if the backup volume is onlinethat is, if the drive is actually connected to the computer and operating. If the backup volume is offline—for example, in a storage locker—the files on it cannot be deleted.

It might seem that any system that allows data to be backed up to offline storage cannot provide Complete Delete, but cryptographic erasure provides a solution to this quandary. A ‘revocable backup system’ encrypts each file before it is backed up with a file-specific key.2 Unlike APFS, the key is not stored on the backup media but instead is stored in a secure database. Deleting the key renders the backup useless.

The same idea is used by ‘digital rights management’ (DRM) systems such as Microsoft’s Information Rights Management that is now built into Microsoft Word.3 These systems make it possible for authors to specify the specific people who are allowed to read or change a specific document. The document is then saved in an encrypted form. If the file is emailed to Pat and Pat attempts to open the file, Pat’s computer contacts the DRM server, proves Pat’s identity (perhaps with a password), and obtains the decryption key. If the document’s owner wants to prevent Pat from accessing the document in the future, all the owner needs to do is to revoke Pat’s access. This deletes the key, and Pat can no longer open the file.

Apple’s Time Machine doesn’t implement cryptographic erasure or digital rights management, although it could.

Computer security experts have long understood that residual data poses a threat to both privacy and security. Nevertheless, except for the per-file encryption available on Apple’s APFS, most computer users and IT professionals largely ignored the issue for the past seven decades. Here are some possible reasons why:

  • Storage was so limited for the first 50 years of computing that deleted files were soon overwritten with new data.

  • Today’s large storage devices and virtually unlimited cloud storage means that it is very easy to inadvertently create and distribute many copies of the same digital object, such that when one copy is erased, others remain.

  • Until the introduction of camera phones and home video surveillance systems, few computer users created large amounts of data that were both sensitive and unprotected. For example, although some people tracked their finances on home computers in the 1990s, these programs generally implemented their own password-based encryption to protect the financial records.

Many of these reasons no longer apply. Today, the collection of files that a single user has under their control has grown so large that it is no longer reasonable to have a user interface that shows a user all their files and encourages browsing. Instead, systems for managing photographs, digital scans, and files in general typically show users only a small fraction of their holdings. As a result, it’s easy for users to forget that something has been archived.

More troubling, cell phone cameras have made it easy for people to create and exchange intimate images. A 2024 international study of 16,693 respondents distributed across ten countries found sharing of intimate images to be widespread in the teenage and adult population. It’s great that people are able to use computers to enhance their relationships! However, when those relationships go bad, those same images can cause a problem: the same study found that one in seven adults have been threatened by someone with whom they shared an image that the image would be further circulated, an act sometimes termed ‘sextortion.’4

3. Balancing the Needs of Users, Litigants, and Law Enforcement

As computer users, we all have an intuitive understanding of what it means to delete something. Problems of security and privacy arise because that understanding is not well specified. More specifically, many of us have different expectations of what deletion means, and different computer systems implement deletion differently, resulting in inconsistent mappings between our semantic notions of information eradication and the functionality that our systems provide.

‘Policy’ is a word that we can use to describe the intentions of system designers. So far, this brief case study has described four different deletion policies:

  • Policy #0—Never delete. In principle, it is possible to design a computer that never deletes files. Such computers require more and more storage as they run, but not as much as you might think. Modern source code control systems such as ‘git’ never delete anything; they simply hide old information so that it is not visible—but that old data can always be recovered. Cloud-based storage systems like Dropbox, Google Drive, and Microsoft OneDrive can be configured to store every version of every document for months or even years.

  • Policy #1—Indeterminant deletion. As we have seen, on most systems, what happens to data after the user presses delete is undefined. The physical blocks that hold the data for deleted files are not overwritten. These data can be recovered using special file recovery tools until the blocks are allocated to another purpose and overwritten.

  • Policy #2—Local complete delete. Systems can implement Complete Delete for local files, either by explicitly overwriting blocks when they are freed or by using cryptographic erasure. However, copies of files on other computers or in backup systems will be left unscathed.

  • Policy #3—Global complete delete. Every file can be encrypted such that each file can only be accessed using a centrally managed encryption key. Deleting the key assures that the file is now inaccessible everywhere. This also dramatically increases system complexity and impacts system performance, ultimately increasing costs.

It’s important to note that these policies are all quite abstract and need to be fleshed out. For example, each could be implemented at the file, directory, application, device, or system level. Also, while this discussion so far has focused on files within file systems, all of these concepts apply equally well to individual data records stored within a database, such as address book and chat transcripts that the Android operating system stores in various SQLite3 databases.5

Although policy #0 is certainly possible in theory, it is unworkable in practice. Many users are uncomfortable knowing that everything they create on a computer will be captured and preserved for the indeterminate future. One aspect of personal privacy is being able to control information that we create; preserving everything ever created removes this aspect of control. Many people want to be able to delete information that they find embarrassing, lest someone else discover it. A person who commits a minor legal infraction that doesn’t have any obvious victims—for example, illegally parking in front of a fire hydrant for a few minutes to run into a store—might want to delete evidence such as a photograph that documents their offense. There are many reasons other than freeing up storage space that might motivate a person to delete information.

If you send a photo to someone, should you be able to decide when that photo gets deleted? That is, should you be able to delete the photo after you send it, so that the recipient can no longer view it or share it with others? Although this might seem to be an application for the DRM system described above, photographs protected with such a system could only be shared through a DRM-enabled application. That is, they could not be used with systems like Google Photos because those systems wouldn’t know how to get the decryption key to let the photo be viewed. Such systems would also not be foolproof; it would always be possible to use one cell phone to take a photo of another cell phone displaying a ‘protected’ image, a vulnerability termed the ‘analog hole.’6

3.1. Shielding Criminals from Law Enforcement

When designing computer systems, it is also important to anticipate their potential for abuse. It may be that systems can be implemented in a way that maximizes the potential for positive uses while minimizing the ways that they can be abused. Alternatively, it may be that the abuse potential is quite small, and addressing it would pose an undue burden on legitimate users. This tension is at the heart of the battle over the use of encryption in the United States, a battle that has raged since the 1970s.7

Until Apple introduced per-file encryption with iOS 8 in September 2014, most consumer computers implemented policy #1—indeterminant deletion.8 As a result, for decades, law enforcement agencies have been able to use special digital forensics tools to recover deleted material from mobile phones, laptops, and servers that may have been used in the commission of a crime. Many law enforcement agencies are now frustrated by the privacy and security that iOS offers because it also helps to protect criminals.9

Some policymakers, who are likewise troubled by the idea of crooks using privacy-enhancing technologies like encryption, have proposed that computer systems have special modes to allow for law enforcement access. However, it is important to remember that “law enforcement” is a very broad term; access created by a manufacturer to allow the US Federal Bureau of Investigation to access encrypted files would also be available for use by security services in nondemocratic countries like Russia and China. The technologies would also be usable by countries that use their intelligence services for industrial espionage to help their domestic companies gain advantage over their international rivals.

3.2. Protecting Users from Themselves

Backups protect data from being lost due to hardware failures, hardware being lost or stolen, software bugs, ransomware, and user accident. Systems like Apple’s iPhoto that hide data when it is ‘deleted’ and then actually erase it at a later point in time allow users to recover data that they accidently delete—or that they intentionally delete but about which they later change their minds.

However, it takes significant engineering to design systems that let users change their mind. Not only do these systems have to reliably hide data now and delete it later, they must also expose this functionality to users in a way that is easy to discover. Some users will invariably think that they have deleted something when they have not.

3.3. Who Decides?

Another important design question is that of ‘design control’: Who should decide how systems behave when a file is deleted? Graphical user interfaces for laptops and cell phones have become steadily more complex since their introduction; users today have a bewildering number of configuration options, even on systems that are prized for their usability. Although engineers once hoped that a new generation of ‘digital natives’ who grew up with advanced technology would have no problems using these complex systems, that turned out to be wishful thinking.10

Currently, it is mostly engineers, user-experience designers, and product managers that decide how systems should implement deletion policy, when information should be just hidden, and whether information that is deleted should be completely erased or recoverable with forensic tools. Instead of technologists, these decisions could also be made by government policymakers, lawyers working for either the manufacturer or large customers, or even courts. Sometimes users are allowed to change these decisions through options, but generally, users cannot.

4. Conclusion

In order for people and organizations to make decisions about technical issues, they first need to recognize that there is a choice to be made.

While engineers and product managers now generally understand that residual information can create a privacy or security risk, addressing this risk still takes engineering effort and requires runtime resources—something that few organizations seem willing to address.

It is unclear how much users know about these risks. Some users clearly think that deleted files are in fact deleted. Yet, it is common to hear users give voice to the idea that anything that was once stored on a computer can never really be deleted. Sometimes users say this with hope—for example, after they have deleted important information that they need to recover. Sometimes they say this with a kind of helplessness and resignation about their lost privacy.

In fact, the indeterminacy of data remanence in modern information systems highlights an important paradox: Information is hard to delete, yet information is hard to retain. That is, it is incredibly difficult to maintain access to important information over an extended time, yet it is also incredibly difficult to completely delete information so that all copies everywhere are no longer accessible.

Most of this case study has been about the second half of that paradox, but the first is one that we live with as well. Consider that physical photo albums from the twentieth century are still widely available, while much of the electronic images created in the first two decades of digital photography have been lost entirely or are on media that their owners can no longer play.

What drives this paradox is the interplay of probability and indeterminacy. Although we would like to know with certainty if a file or piece of data will be preserved or destroyed, in practice these probabilities are not nearly as close to 1.0 as the today’s user interfaces imply.

Discussion Questions

  1. Propose approaches for informing users of the existence of remnant data on their systems.

  2. Compare the advantages of deleting information with information permanence.

  3. Design a policy for addressing the data remanence issue in an end-user computer system.

  4. It is common for the ‘infotainment’ systems in rental cars to remember and display the names and address books of telephones that were recently paired with the car.11 Typically, this information is from previous rentals. Although some infotainment systems have a ‘system reset’ function that makes all of the records appear to vanish, it is rare for this function to be executed between rentals. Why might this be the case, and how would you design things differently to improve privacy protection? How could you assure that the captured data is actually overwritten, rather than simply left as deleted files that could be later recovered using forensic tools?

  5. For cryptographic erasure of a digital photograph to be truly complete, the image would need to be encrypted the moment it was taken and be decryptable by approved tools. Is there a way for such images to be edited, cropped, touched up, and incorporated into electronic publications like websites? If such a system could be designed, what would be the barriers to having it deployed?

  6. Object overwriting is a straightforward, well-understood approach for eliminating the ability to recover deleted data. However, software engineers have not made object overwriting mandatory on today’s computer systems because it would negatively impact performance and battery life. This is a decision that must be made at the system level; it cannot be made a user option. Do you think that systems should employ object overwriting, or do you think that the current approach of not overwriting is correct?

  7. Should law-abiding people and organizations be able to irrevocably erase data? If so, how would that impact the abilities of law enforcement? Discuss the elements that this policy choice shares with the debate over law enforcement access to encrypted data and how it is different.

  8. Some US policymakers believe that it should be possible for the US government to be able to access the encrypted contents of phones and other devices, but at the same time, they want US devices protected against foreign governments such as the governments of Russia, China, and Iran. US companies have responded that any capabilities that would be made available to the US government would have to be made available to other governments as well, resulting in systems that are less secure for everyone. Argue one position or the other.

  9. ‘Redaction’ is a special kind of deletion in which specific words or sentences are removed from a document, or a portion of an image is obscured, to shield information or protect confidentiality. PDF-editing tools like Adobe Acrobat and Apple’s Preview command can redact information and leave black boxes in PDFs, but there is a long history of Microsoft Word users attempting to redact by drawing black boxes on a document with Word’s highlighter feature. Unfortunately, words obscured in this way can be easily recovered. Discuss how the design features and usability failings of these document-redaction approaches are like the issues in file deletion and how they are different.

Glossary

Block size. The smallest amount of data that a mass storage system can read or write at a time.

Cache. A storage area or system that is used for storing temporary copies of objects, usually to improve efficiency.

Rewritable storage. A system that allows data to be written and overwritten. Only the most recent version written to each location can be recovered.

Random-access storage. A storage system that allows chunks of data to be written or read in any order.

Residual data. Recoverable data that remains on a storage system after a data object is deleted by the user.

Storage. A system that allows data to be written (stored) and retrieved (read).

Further Reading

Diesburg, Sarah M. and An-I Andy Wang. “A Survey of Confidential Data Storage and Deletion Methods.” ACM Computing Surveys 43, no. 1 (November 2010):1-37. https://doi.org/10.1145/1824795.1824797.

Garfinkel, Simson and Abhi Shelat. “Remembrance of Data Passed: A Study of Disk Sanitization Practices.” IEEE Security & Privacy 1, no. 1 (January/February 2003) 17-27. https://doi.org/10.1109/MSECP.2003.1176992.

Garfinkel, Simson. “Leaking Sensitive Information in Complex Document Files—and How to Prevent It.” IEEE Security & Privacy 12, no. 1 (January/February 2014): 20-27. https://doi.org/10.1109/MSP.2013.131.

Appendix: Memory Forensics

This case study is primarily concerned with files that are stored on mass storage devices, but files can also be stored in a computer’s random access memory (RAM). This memory is ‘volatile,’ which loses its contents when the device is turned off or reset; hard drives and flash memory are called ‘nonvolatile’ because they retain their data when the power is off. RAM is significantly faster than mass storage systems, but there is much less of it. A modern cell phone typically has just a few gigabytes of RAM, whereas it may have hundreds of gigabytes of nonvolatile storage.

Copies of a photo that are in RAM on a server or phone are inevitably the first that will be overwritten, as RAM is a limited resource and must be constantly reused. The term ‘data lifetime’ describes how long data lasts in a system. The practice of understanding the contents of RAM and extracting actionable data or evidence is called ‘memory forensics.’

How long an image remains in memory before that memory is needed for another purpose depends on many factors, including the size of the image, the amount of memory in the computer, the software that’s running on the computer, the amount of data that programs are using, and the computer’s operating system. In general, the more activity on a computer, the faster that memory will be overwritten when it is no longer needed, but sometimes some data may persist in RAM for days, weeks, or even longer. In 2004, researchers at Stanford argued that this haphazard approach to erasing deallocated memory is a system design failure impacting not only memory pages used for large digital objects (as in our example) but rewritable memory through the entire operating system.12 The following year, they demonstrated a modification to the Linux operating system that intentionally erased memory when it was no longer needed, which they termed ‘secure deallocation.’13

Few modern systems perform secure deallocation; the conventional wisdom is that it would result in a significant performance penalty. The Stanford team showed that there is a penalty, but it is not very large. “Surprisingly, zero-on-free overheads are less than 7% for all tested applications, despite the fact that these applications allocate hundreds or thousands of megabytes of data during their lifetime,” they concluded after rigorous testing.14

Memory forensics is a forensic technique for extracting and analyzing the contents of a computer’s volatile RAM. Like mass storage systems, RAM is organized in blocks, although RAM blocks are called ‘pages.’ Unlike mass storage systems, RAM can be written or read byte by byte and in any sequence. Memory forensics starts with the systematic extraction of the computer’s memory pages and recording that information in a single file, called a ‘memory image.’ The copy is typically made with special-purpose software or hardware.

Memory images can be used in development for system debugging, but they can also be used in an investigation. With a memory image, an analyst can hunt for malware that might have infected a computer; another use is to hunt for text messages, images, and other information that may remain in memory from the computer user. For example, a cell phone might contain text messages exchanged between a murderer and their victim. Even if the text messages are deleted, it might be possible to recover them from the unallocated memory of either person’s phone; the only way to know for sure is to make an image of the phone’s memory and then search for messages. Memory investigations can be time consuming and require significant expertise, so the capability tends to be reserved for the most important cases and is only used after other investigative tools have proved inconclusive.

There are primarily two approaches for protecting against memory forensics. The first is to make memory contents hard to extract using special-purpose hardware or software. For example, for decades it was common practice for laptop manufacturers to put sockets inside their computers to connect RAM and mass storage systems. This allowed the end-user to upgrade their laptop’s RAM and storage, but it also made it possible to remove the memory and forensically image the devices. (Although RAM typically loses its contents when it is powered off and removed from a computer, very cold RAM may retain its contents long enough to move from one computer to another.15) Today, nearly all laptops have RAM and solid-state drives soldered directly to the motherboard, making systems thinner and more reliable, but also making it harder for law enforcement to remove the devices for forensic imaging. On the software side, operating system facilities to let one process read another’s memory have been removed from many operating systems because the capability was widely used by malware.

Another approach to protect against memory forensics is to encrypt the contents of the RAM when it is not being used. Such approaches are not widely used today due to the overhead resulting from the need to constantly decrypt memory contents when it moves from the physical memory chips and into the microprocessor’s cache and then to re-encrypt memory when it is flushed from the cache back to main memory. A technique known as ‘homomorphic encryption’ allows the computer to compute with encrypted data without decrypting it, although this approach also results in a considerable performance penalty.

Bibliography

Anon. 2017. “The Digital Native Is a Myth.” Nature 547 (July 27, 2017). https://www.nature.com/articles/547380a.

Boneh, Dan, and Richard J. Lipton. 1996. “A Revocable Backup System.” Sixth USENIX Security Symposium, https://www.usenix.org/conference/6th-usenix-security-symposium/revocable-backup-system.

Chow, Jim, Ben Pfaff, Tal Garfinkel, and Mendel Rosenblum. 2005. “Shredding Your Garbage: Reducing Data Lifetime Through Secure Deallocation.” 14th USENIX Security Symposium, https://www.usenix.org/conference/14th-usenix-security-symposium/shredding-your-garbage-reducing-data-lifetime-through.

Electronic Frontier Foundation. 2024. “Analog Hole.” Accessed August 21, 2024. https://www.eff.org/issues/analog-hole.

Garfinkel, Simson L. 2005. “Design Principles and Patterns for Computer Systems That Are Simultaneously Secure and Usable.” PhD Thesis, MIT.

Garfinkel, Tal, Ben Pfaff, Jim Chow, and Mendel Rosenblum.2004.  “Data Lifetime Is a Systems Problem.” In Proceedings of the 11th Workshop on ACM SIGOPS European Workshop, 10-es. EW 11. New York, NY, USA: Association for Computing Machinery, https://doi.org/10.1145/1133572.1133599.

Halderman, J. Alex, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. 2009. “Lest We Remember: Cold-Boot Attacks on Encryption Keys.” Communications of the ACM 52, no. 5 (May 1, 2009): 91–98. https://doi.org/10.1145/1506409.1506429.

Henry, Nicola, and Rebecca Umbach. 2024. “Sextortion: Prevalence and Correlates in 10 Countries.” Computers in Human Behavior 158 (September 1, 2024): 108298. https://doi.org/10.1016/j.chb.2024.108298.

Johnson, Allie. 2021. “Q&A: I Left My Data on the Infotainment System of a Rental Car. Problem?,” July 15, 2021. https://us.norton.com/blog/how-to/left-data-on-rental-car-infotainment.

Levy, Steven. 2022. Crypto: How the Code Rebels Beat the Government—Saving Privacy in the Digital Age. New York: Penguin Publishing Group.

Pawlaszczyk, Dirk, and Christian Hummert. 2021. “Making the Invisible Visible – Techniques for Recovering Deleted SQLite Data Records.” International Journal of Cyber Forensics and Advanced Threat Investigations 1, no. 1–3 (February 15, 2021): 27–41. https://doi.org/10.46386/ijcfati.v1i1-3.17.

Comments
0
comment
No comments here