src/tools/cargo/src/cargo/core/global_cache_tracker.rs - toolchain/rustc - Git at Google

 //! Support for tracking the last time files were used to assist with cleaning
 //! up those files if they haven't been used in a while.
 //!
 //! Tracking of cache files is stored in a sqlite database which contains a
 //! timestamp of the last time the file was used, as well as the size of the
 //! file.
 //!
 //! While cargo is running, when it detects a use of a cache file, it adds a
 //! timestamp to [`DeferredGlobalLastUse`]. This batches up a set of changes
 //! that are then flushed to the database all at once (via
 //! [`DeferredGlobalLastUse::save`]). Ideally saving would only be done once
 //! for performance reasons, but that is not really possible due to the way
 //! cargo works, since there are different ways cargo can be used (like `cargo
 //! generate-lockfile`, `cargo fetch`, and `cargo build` are all very
 //! different ways the code is used).
 //!
 //! All of the database interaction is done through the [`GlobalCacheTracker`]
 //! type.
 //!
 //! There is a single global [`GlobalCacheTracker`] and
 //! [`DeferredGlobalLastUse`] stored in [`GlobalContext`].
 //!
 //! The high-level interface for performing garbage collection is defined in
 //! the [`crate::core::gc`] module. The functions there are responsible for
 //! interacting with the [`GlobalCacheTracker`] to handle cleaning of global
 //! cache data.
 //!
 //! ## Automatic gc
 //!
 //! Some commands (primarily the build commands) will trigger an automatic
 //! deletion of files that haven't been used in a while. The high-level
 //! interface for this is the [`crate::core::gc::auto_gc`] function.
 //!
 //! The [`GlobalCacheTracker`] database tracks the last time an automatic gc
 //! was performed so that it is only done once per day for performance
 //! reasons.
 //!
 //! ## Manual gc
 //!
 //! The user can perform a manual garbage collection with the `cargo clean`
 //! command. That command has a variety of options to specify what to delete.
 //! Manual gc supports deleting based on age or size or both. From a
 //! high-level, this is done by the [`crate::core::gc::Gc::gc`] method, which
 //! calls into [`GlobalCacheTracker`] to handle all the cleaning.
 //!
 //! ## Locking
 //!
 //! Usage of the database requires that the package cache is locked to prevent
 //! concurrent access. Although sqlite has built-in locking support, we want
 //! to use cargo's locking so that the "Blocking" message gets displayed, and
 //! so that locks can block indefinitely for long-running build commands.
 //! [`rusqlite`] has a default timeout of 5 seconds, though that is
 //! configurable.
 //!
 //! When garbage collection is being performed, the package cache lock must be
 //! in [`CacheLockMode::MutateExclusive`] to ensure no other cargo process is
 //! running. See [`crate::util::cache_lock`] for more detail on locking.
 //!
 //! When performing automatic gc, [`crate::core::gc::auto_gc`] will skip the
 //! GC if the package cache lock is already held by anything else. Automatic
 //! GC is intended to be opportunistic, and should impose as little disruption
 //! to the user as possible.
 //!
 //! ## Compatibility
 //!
 //! The database must retain both forwards and backwards compatibility between
 //! different versions of cargo. For the most part, this shouldn't be too
 //! difficult to maintain. Generally sqlite doesn't change on-disk formats
 //! between versions (the introduction of WAL is one of the few examples where
 //! version 3 had a format change, but we wouldn't use it anyway since it has
 //! shared-memory requirements cargo can't depend on due to things like
 //! network mounts).
 //!
 //! Schema changes must be managed through [`migrations`] by adding new
 //! entries that make a change to the database. Changes must not break older
 //! versions of cargo. Generally, adding columns should be fine (either with a
 //! default value, or NULL). Adding tables should also be fine. Just don't do
 //! destructive things like removing a column, or changing the semantics of an
 //! existing column.
 //!
 //! Since users may run older versions of cargo that do not do cache tracking,
 //! the [`GlobalCacheTracker::sync_db_with_files`] method helps dealing with
 //! keeping the database in sync in the presence of older versions of cargo
 //! touching the cache directories.
 //!
 //! ## Performance
 //!
 //! A lot of focus on the design of this system is to minimize the performance
 //! impact. Every build command needs to save updates which we try to avoid
 //! having a noticeable impact on build times. Systems like Windows,
 //! particularly with a magnetic hard disk, can experience a fairly large
 //! impact of cargo's overhead. Cargo's benchsuite has some benchmarks to help
 //! compare different environments, or changes to the code here. Please try to
 //! keep performance in mind if making any major changes.
 //!
 //! Performance of `cargo clean` is not quite as important since it is not
 //! expected to be run often. However, it is still courteous to the user to
 //! try to not impact it too much. One part that has a performance concern is
 //! that the clean command will synchronize the database with whatever is on
 //! disk if needed (in case files were added by older versions of cargo that
 //! don't do cache tracking, or if the user manually deleted some files). This
 //! can potentially be very slow, especially if the two are very out of sync.
 //!
 //! ## Filesystems
 //!
 //! Everything here is sensitive to the kind of filesystem it is running on.
 //! People tend to run cargo in all sorts of strange environments that have
 //! limited capabilities, or on things like read-only mounts. The code here
 //! needs to gracefully handle as many situations as possible.
 //!
 //! See also the information in the [Performance](#performance) and
 //! [Locking](#locking) sections when considering different filesystems and
 //! their impact on performance and locking.
 //!
 //! There are checks for read-only filesystems, which is generally ignored.

 use crate::core::gc::GcOpts;
 use crate::core::Verbosity;
 use crate::ops::CleanContext;
 use crate::util::cache_lock::CacheLockMode;
 use crate::util::interning::InternedString;
 use crate::util::sqlite::{self, basic_migration, Migration};
 use crate::util::{Filesystem, Progress, ProgressStyle};
 use crate::{CargoResult, GlobalContext};
 use anyhow::{bail, Context as _};
 use cargo_util::paths;
 use rusqlite::{params, Connection, ErrorCode};
 use std::collections::{hash_map, HashMap};
 use std::path::{Path, PathBuf};
 use std::time::{Duration, SystemTime};
 use tracing::{debug, trace};

 /// The filename of the database.
 const GLOBAL_CACHE_FILENAME: &str = ".global-cache";

 const REGISTRY_INDEX_TABLE: &str = "registry_index";
 const REGISTRY_CRATE_TABLE: &str = "registry_crate";
 const REGISTRY_SRC_TABLE: &str = "registry_src";
 const GIT_DB_TABLE: &str = "git_db";
 const GIT_CO_TABLE: &str = "git_checkout";

 /// How often timestamps will be updated.
 ///
 /// As an optimization timestamps are not updated unless they are older than
 /// the given number of seconds. This helps reduce the amount of disk I/O when
 /// running cargo multiple times within a short window.
 const UPDATE_RESOLUTION: u64 = 60 * 5;

 /// Type for timestamps as stored in the database.
 ///
 /// These are seconds since the Unix epoch.
 type Timestamp = u64;

 /// The key for a registry index entry stored in the database.
 #[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub struct RegistryIndex {
     /// A unique name of the registry source.
     pub encoded_registry_name: InternedString,
 }

 /// The key for a registry `.crate` entry stored in the database.
 #[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub struct RegistryCrate {
     /// A unique name of the registry source.
     pub encoded_registry_name: InternedString,
     /// The filename of the compressed crate, like `foo-1.2.3.crate`.
     pub crate_filename: InternedString,
     /// The size of the `.crate` file.
     pub size: u64,
 }

 /// The key for a registry src directory entry stored in the database.
 #[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub struct RegistrySrc {
     /// A unique name of the registry source.
     pub encoded_registry_name: InternedString,
     /// The directory name of the extracted source, like `foo-1.2.3`.
     pub package_dir: InternedString,
     /// Total size of the src directory in bytes.
     ///
     /// This can be None when the size is unknown. For example, when the src
     /// directory already exists on disk, and we just want to update the
     /// last-use timestamp. We don't want to take the expense of computing disk
     /// usage unless necessary. [`GlobalCacheTracker::populate_untracked`]
     /// will handle any actual NULL values in the database, which can happen
     /// when the src directory is created by an older version of cargo that
     /// did not track sizes.
     pub size: Option<u64>,
 }

 /// The key for a git db entry stored in the database.
 #[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub struct GitDb {
     /// A unique name of the git database.
     pub encoded_git_name: InternedString,
 }

 /// The key for a git checkout entry stored in the database.
 #[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub struct GitCheckout {
     /// A unique name of the git database.
     pub encoded_git_name: InternedString,
     /// A unique name of the checkout without the database.
     pub short_name: InternedString,
     /// Total size of the checkout directory.
     ///
     /// This can be None when the size is unknown. See [`RegistrySrc::size`]
     /// for an explanation.
     pub size: Option<u64>,
 }

 /// Filesystem paths in the global cache.
 ///
 /// Accessing these assumes a lock has already been acquired.
 struct BasePaths {
     /// Root path to the index caches.
     index: PathBuf,
     /// Root path to the git DBs.
     git_db: PathBuf,
     /// Root path to the git checkouts.
     git_co: PathBuf,
     /// Root path to the `.crate` files.
     crate_dir: PathBuf,
     /// Root path to the `src` directories.
     src: PathBuf,
 }

 /// Migrations which initialize the database, and can be used to evolve it over time.
 ///
 /// See [`Migration`] for more detail.
 ///
 /// **Be sure to not change the order or entries here!**
 fn migrations() -> Vec<Migration> {
     vec![
         // registry_index tracks the overall usage of an index cache, and tracks a
         // numeric ID to refer to that index that is used in other tables.
         basic_migration(
             "CREATE TABLE registry_index (
                 id INTEGER PRIMARY KEY AUTOINCREMENT,
                 name TEXT UNIQUE NOT NULL,
                 timestamp INTEGER NOT NULL
             )",
         ),
         // .crate files
         basic_migration(
             "CREATE TABLE registry_crate (
                 registry_id INTEGER NOT NULL,
                 name TEXT NOT NULL,
                 size INTEGER NOT NULL,
                 timestamp INTEGER NOT NULL,
                 PRIMARY KEY (registry_id, name),
                 FOREIGN KEY (registry_id) REFERENCES registry_index (id) ON DELETE CASCADE
              )",
         ),
         // Extracted src directories
         //
         // Note that `size` can be NULL. This will happen when marking a src
         // directory as used that was created by an older version of cargo
         // that didn't do size tracking.
         basic_migration(
             "CREATE TABLE registry_src (
                 registry_id INTEGER NOT NULL,
                 name TEXT NOT NULL,
                 size INTEGER,
                 timestamp INTEGER NOT NULL,
                 PRIMARY KEY (registry_id, name),
                 FOREIGN KEY (registry_id) REFERENCES registry_index (id) ON DELETE CASCADE
              )",
         ),
         // Git db directories
         basic_migration(
             "CREATE TABLE git_db (
                 id INTEGER PRIMARY KEY AUTOINCREMENT,
                 name TEXT UNIQUE NOT NULL,
                 timestamp INTEGER NOT NULL
              )",
         ),
         // Git checkout directories
         basic_migration(
             "CREATE TABLE git_checkout (
                 git_id INTEGER NOT NULL,
                 name TEXT UNIQUE NOT NULL,
                 size INTEGER,
                 timestamp INTEGER NOT NULL,
                 PRIMARY KEY (git_id, name),
                 FOREIGN KEY (git_id) REFERENCES git_db (id) ON DELETE CASCADE
              )",
         ),
         // This is a general-purpose single-row table that can store arbitrary
         // data. Feel free to add columns (with ALTER TABLE) if necessary.
         basic_migration(
             "CREATE TABLE global_data (
                 last_auto_gc INTEGER NOT NULL
             )",
         ),
         // last_auto_gc tracks the last time auto-gc was run (so that it only
         // runs roughly once a day for performance reasons). Prime it with the
         // current time to establish a baseline.
         Box::new(|conn| {
             conn.execute(
                 "INSERT INTO global_data (last_auto_gc) VALUES (?1)",
                 [now()],
             )?;
             Ok(())
         }),
     ]
 }

 /// Type for SQL columns that refer to the primary key of their parent table.
 ///
 /// For example, `registry_crate.registry_id` refers to its parent `registry_index.id`.
 #[derive(Copy, Clone, Debug, PartialEq)]
 struct ParentId(i64);

 impl rusqlite::types::FromSql for ParentId {
     fn column_result(value: rusqlite::types::ValueRef<'_>) -> rusqlite::types::FromSqlResult<Self> {
         let i = i64::column_result(value)?;
         Ok(ParentId(i))
     }
 }

 impl rusqlite::types::ToSql for ParentId {
     fn to_sql(&self) -> rusqlite::Result<rusqlite::types::ToSqlOutput<'_>> {
         Ok(rusqlite::types::ToSqlOutput::from(self.0))
     }
 }

 /// Tracking for the global shared cache (registry files, etc.).
 ///
 /// This is the interface to the global cache database, used for tracking and
 /// cleaning. See the [`crate::core::global_cache_tracker`] module docs for
 /// details.
 #[derive(Debug)]
 pub struct GlobalCacheTracker {
     /// Connection to the SQLite database.
     conn: Connection,
     /// This is an optimization used to make sure cargo only checks if gc
     /// needs to run once per session. This starts as `false`, and then the
     /// first time it checks if automatic gc needs to run, it will be set to
     /// `true`.
     auto_gc_checked_this_session: bool,
 }

 impl GlobalCacheTracker {
     /// Creates a new [`GlobalCacheTracker`].
     ///
     /// The caller is responsible for locking the package cache with
     /// [`CacheLockMode::DownloadExclusive`] before calling this.
     pub fn new(gctx: &GlobalContext) -> CargoResult<GlobalCacheTracker> {
         let db_path = Self::db_path(gctx);
         // A package cache lock is required to ensure only one cargo is
         // accessing at the same time. If there is concurrent access, we
         // want to rely on cargo's own "Blocking" system (which can
         // provide user feedback) rather than blocking inside sqlite
         // (which by default has a short timeout).
         let db_path = gctx.assert_package_cache_locked(CacheLockMode::DownloadExclusive, &db_path);
         let mut conn = Connection::open(db_path)?;
         conn.pragma_update(None, "foreign_keys", true)?;
         sqlite::migrate(&mut conn, &migrations())?;
         Ok(GlobalCacheTracker {
             conn,
             auto_gc_checked_this_session: false,
         })
     }

     /// The path to the database.
     pub fn db_path(gctx: &GlobalContext) -> Filesystem {
         gctx.home().join(GLOBAL_CACHE_FILENAME)
     }

     /// Given an encoded registry name, returns its ID.
     ///
     /// Returns None if the given name isn't in the database.
     fn id_from_name(
         conn: &Connection,
         table_name: &str,
         encoded_name: &str,
     ) -> CargoResult<Option<ParentId>> {
         let mut stmt =
             conn.prepare_cached(&format!("SELECT id FROM {table_name} WHERE name = ?"))?;
         match stmt.query_row([encoded_name], |row| row.get(0)) {
             Ok(id) => Ok(Some(id)),
             Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
             Err(e) => Err(e.into()),
         }
     }

     /// Returns a map of ID to path for the given ids in the given table.
     ///
     /// For example, given `registry_index` IDs, it returns filenames of the
     /// form "index.crates.io-6f17d22bba15001f".
     fn get_id_map(
         conn: &Connection,
         table_name: &str,
         ids: &[i64],
     ) -> CargoResult<HashMap<i64, PathBuf>> {
         let mut stmt =
             conn.prepare_cached(&format!("SELECT name FROM {table_name} WHERE id = ?1"))?;
         ids.iter()
             .map(|id| {
                 let name = stmt.query_row(params![id], |row| {
                     Ok(PathBuf::from(row.get::<_, String>(0)?))
                 })?;
                 Ok((*id, name))
             })
             .collect()
     }

     /// Returns all index cache timestamps.
     pub fn registry_index_all(&self) -> CargoResult<Vec<(RegistryIndex, Timestamp)>> {
         let mut stmt = self
             .conn
             .prepare_cached("SELECT name, timestamp FROM registry_index")?;
         let rows = stmt
             .query_map([], |row| {
                 let encoded_registry_name = row.get_unwrap(0);
                 let timestamp = row.get_unwrap(1);
                 let kind = RegistryIndex {
                     encoded_registry_name,
                 };
                 Ok((kind, timestamp))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         Ok(rows)
     }

     /// Returns all registry crate cache timestamps.
     pub fn registry_crate_all(&self) -> CargoResult<Vec<(RegistryCrate, Timestamp)>> {
         let mut stmt = self.conn.prepare_cached(
             "SELECT registry_index.name, registry_crate.name, registry_crate.size, registry_crate.timestamp
              FROM registry_index, registry_crate
              WHERE registry_crate.registry_id = registry_index.id",
         )?;
         let rows = stmt
             .query_map([], |row| {
                 let encoded_registry_name = row.get_unwrap(0);
                 let crate_filename = row.get_unwrap(1);
                 let size = row.get_unwrap(2);
                 let timestamp = row.get_unwrap(3);
                 let kind = RegistryCrate {
                     encoded_registry_name,
                     crate_filename,
                     size,
                 };
                 Ok((kind, timestamp))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         Ok(rows)
     }

     /// Returns all registry source cache timestamps.
     pub fn registry_src_all(&self) -> CargoResult<Vec<(RegistrySrc, Timestamp)>> {
         let mut stmt = self.conn.prepare_cached(
             "SELECT registry_index.name, registry_src.name, registry_src.size, registry_src.timestamp
              FROM registry_index, registry_src
              WHERE registry_src.registry_id = registry_index.id",
         )?;
         let rows = stmt
             .query_map([], |row| {
                 let encoded_registry_name = row.get_unwrap(0);
                 let package_dir = row.get_unwrap(1);
                 let size = row.get_unwrap(2);
                 let timestamp = row.get_unwrap(3);
                 let kind = RegistrySrc {
                     encoded_registry_name,
                     package_dir,
                     size,
                 };
                 Ok((kind, timestamp))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         Ok(rows)
     }

     /// Returns all git db timestamps.
     pub fn git_db_all(&self) -> CargoResult<Vec<(GitDb, Timestamp)>> {
         let mut stmt = self
             .conn
             .prepare_cached("SELECT name, timestamp FROM git_db")?;
         let rows = stmt
             .query_map([], |row| {
                 let encoded_git_name = row.get_unwrap(0);
                 let timestamp = row.get_unwrap(1);
                 let kind = GitDb { encoded_git_name };
                 Ok((kind, timestamp))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         Ok(rows)
     }

     /// Returns all git checkout timestamps.
     pub fn git_checkout_all(&self) -> CargoResult<Vec<(GitCheckout, Timestamp)>> {
         let mut stmt = self.conn.prepare_cached(
             "SELECT git_db.name, git_checkout.name, git_checkout.size, git_checkout.timestamp
              FROM git_db, git_checkout
              WHERE git_checkout.git_id = git_db.id",
         )?;
         let rows = stmt
             .query_map([], |row| {
                 let encoded_git_name = row.get_unwrap(0);
                 let short_name = row.get_unwrap(1);
                 let size = row.get_unwrap(2);
                 let timestamp = row.get_unwrap(3);
                 let kind = GitCheckout {
                     encoded_git_name,
                     short_name,
                     size,
                 };
                 Ok((kind, timestamp))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         Ok(rows)
     }

     /// Returns whether or not an auto GC should be performed, compared to the
     /// last time it was recorded in the database.
     pub fn should_run_auto_gc(&mut self, frequency: Duration) -> CargoResult<bool> {
         trace!(target: "gc", "should_run_auto_gc");
         if self.auto_gc_checked_this_session {
             return Ok(false);
         }
         let last_auto_gc: Timestamp =
             self.conn
                 .query_row("SELECT last_auto_gc FROM global_data", [], |row| row.get(0))?;
         let should_run = last_auto_gc + frequency.as_secs() < now();
         trace!(target: "gc",
             "last auto gc was {}, {}",
             last_auto_gc,
             if should_run { "running" } else { "skipping" }
         );
         self.auto_gc_checked_this_session = true;
         Ok(should_run)
     }

     /// Writes to the database to indicate that an automatic GC has just been
     /// completed.
     pub fn set_last_auto_gc(&self) -> CargoResult<()> {
         self.conn
             .execute("UPDATE global_data SET last_auto_gc = ?1", [now()])?;
         Ok(())
     }

     /// Deletes files from the global cache based on the given options.
     pub fn clean(&mut self, clean_ctx: &mut CleanContext<'_>, gc_opts: &GcOpts) -> CargoResult<()> {
         self.clean_inner(clean_ctx, gc_opts)
             .with_context(|| "failed to clean entries from the global cache")
     }

     #[tracing::instrument(skip_all)]
     fn clean_inner(
         &mut self,
         clean_ctx: &mut CleanContext<'_>,
         gc_opts: &GcOpts,
     ) -> CargoResult<()> {
         let gctx = clean_ctx.gctx;
         let base = BasePaths {
             index: gctx.registry_index_path().into_path_unlocked(),
             git_db: gctx.git_db_path().into_path_unlocked(),
             git_co: gctx.git_checkouts_path().into_path_unlocked(),
             crate_dir: gctx.registry_cache_path().into_path_unlocked(),
             src: gctx.registry_source_path().into_path_unlocked(),
         };
         let now = now();
         trace!(target: "gc", "cleaning {gc_opts:?}");
         let tx = self.conn.transaction()?;
         let mut delete_paths = Vec::new();
         // This can be an expensive operation, so only perform it if necessary.
         if gc_opts.is_download_cache_opt_set() {
             // TODO: Investigate how slow this might be.
             Self::sync_db_with_files(
                 &tx,
                 now,
                 gctx,
                 &base,
                 gc_opts.is_download_cache_size_set(),
                 &mut delete_paths,
             )
             .with_context(|| "failed to sync tracking database")?
         }
         if let Some(max_age) = gc_opts.max_index_age {
             let max_age = now - max_age.as_secs();
             Self::get_registry_index_to_clean(&tx, max_age, &base, &mut delete_paths)?;
         }
         if let Some(max_age) = gc_opts.max_src_age {
             let max_age = now - max_age.as_secs();
             Self::get_registry_items_to_clean_age(
                 &tx,
                 max_age,
                 REGISTRY_SRC_TABLE,
                 &base.src,
                 &mut delete_paths,
             )?;
         }
         if let Some(max_age) = gc_opts.max_crate_age {
             let max_age = now - max_age.as_secs();
             Self::get_registry_items_to_clean_age(
                 &tx,
                 max_age,
                 REGISTRY_CRATE_TABLE,
                 &base.crate_dir,
                 &mut delete_paths,
             )?;
         }
         if let Some(max_age) = gc_opts.max_git_db_age {
             let max_age = now - max_age.as_secs();
             Self::get_git_db_items_to_clean(&tx, max_age, &base, &mut delete_paths)?;
         }
         if let Some(max_age) = gc_opts.max_git_co_age {
             let max_age = now - max_age.as_secs();
             Self::get_git_co_items_to_clean(&tx, max_age, &base.git_co, &mut delete_paths)?;
         }
         // Size collection must happen after date collection so that dates
         // have precedence, since size constraints are a more blunt
         // instrument.
         //
         // These are also complicated by the `--max-download-size` option
         // overlapping with `--max-crate-size` and `--max-src-size`, which
         // requires some coordination between those options which isn't
         // necessary with the age-based options. An item's age is either older
         // or it isn't, but contrast that with size which is based on the sum
         // of all tracked items. Also, `--max-download-size` is summed against
         // both the crate and src tracking, which requires combining them to
         // compute the size, and then separating them to calculate the correct
         // paths.
         if let Some(max_size) = gc_opts.max_crate_size {
             Self::get_registry_items_to_clean_size(
                 &tx,
                 max_size,
                 REGISTRY_CRATE_TABLE,
                 &base.crate_dir,
                 &mut delete_paths,
             )?;
         }
         if let Some(max_size) = gc_opts.max_src_size {
             Self::get_registry_items_to_clean_size(
                 &tx,
                 max_size,
                 REGISTRY_SRC_TABLE,
                 &base.src,
                 &mut delete_paths,
             )?;
         }
         if let Some(max_size) = gc_opts.max_git_size {
             Self::get_git_items_to_clean_size(&tx, max_size, &base, &mut delete_paths)?;
         }
         if let Some(max_size) = gc_opts.max_download_size {
             Self::get_registry_items_to_clean_size_both(&tx, max_size, &base, &mut delete_paths)?;
         }

         clean_ctx.remove_paths(&delete_paths)?;

         if clean_ctx.dry_run {
             tx.rollback()?;
         } else {
             tx.commit()?;
         }
         Ok(())
     }

     /// Returns a list of directory entries in the given path.
     fn names_from(path: &Path) -> CargoResult<Vec<String>> {
         let entries = match path.read_dir() {
             Ok(e) => e,
             Err(e) => {
                 if e.kind() == std::io::ErrorKind::NotFound {
                     return Ok(Vec::new());
                 } else {
                     return Err(
                         anyhow::Error::new(e).context(format!("failed to read path `{path:?}`"))
                     );
                 }
             }
         };
         let names = entries
             .filter_map(|entry| entry.ok()?.file_name().into_string().ok())
             .collect();
         Ok(names)
     }

     /// Synchronizes the database to match the files on disk.
     ///
     /// This performs the following cleanups:
     ///
     /// 1. Remove entries from the database that are missing on disk.
     /// 2. Adds missing entries to the database that are on disk (such as when
     ///    files are added by older versions of cargo).
     /// 3. Fills in the `size` column where it is NULL (such as when something
     ///    is added to disk by an older version of cargo, and one of the mark
     ///    functions marked it without knowing the size).
     ///
     ///    Size computations are only done if `sync_size` is set since it can
     ///    be a very expensive operation. This should only be set if the user
     ///    requested to clean based on the cache size.
     /// 4. Checks for orphaned files. For example, if there are `.crate` files
     ///    associated with an index that does not exist.
     ///
     ///    These orphaned files will be added to `delete_paths` so that the
     ///    caller can delete them.
     #[tracing::instrument(skip(conn, gctx, base, delete_paths))]
     fn sync_db_with_files(
         conn: &Connection,
         now: Timestamp,
         gctx: &GlobalContext,
         base: &BasePaths,
         sync_size: bool,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "starting db sync");
         // For registry_index and git_db, add anything that is missing in the db.
         Self::update_parent_for_missing_from_db(conn, now, REGISTRY_INDEX_TABLE, &base.index)?;
         Self::update_parent_for_missing_from_db(conn, now, GIT_DB_TABLE, &base.git_db)?;

         // For registry_crate, registry_src, and git_checkout, remove anything
         // from the db that isn't on disk.
         Self::update_db_for_removed(
             conn,
             REGISTRY_INDEX_TABLE,
             "registry_id",
             REGISTRY_CRATE_TABLE,
             &base.crate_dir,
         )?;
         Self::update_db_for_removed(
             conn,
             REGISTRY_INDEX_TABLE,
             "registry_id",
             REGISTRY_SRC_TABLE,
             &base.src,
         )?;
         Self::update_db_for_removed(conn, GIT_DB_TABLE, "git_id", GIT_CO_TABLE, &base.git_co)?;

         // For registry_index and git_db, remove anything from the db that
         // isn't on disk.
         //
         // This also collects paths for any child files that don't have their
         // respective parent on disk.
         Self::update_db_parent_for_removed_from_disk(
             conn,
             REGISTRY_INDEX_TABLE,
             &base.index,
             &[&base.crate_dir, &base.src],
             delete_paths,
         )?;
         Self::update_db_parent_for_removed_from_disk(
             conn,
             GIT_DB_TABLE,
             &base.git_db,
             &[&base.git_co],
             delete_paths,
         )?;

         // For registry_crate, registry_src, and git_checkout, add anything
         // that is missing in the db.
         Self::populate_untracked_crate(conn, now, &base.crate_dir)?;
         Self::populate_untracked(
             conn,
             now,
             gctx,
             REGISTRY_INDEX_TABLE,
             "registry_id",
             REGISTRY_SRC_TABLE,
             &base.src,
             sync_size,
         )?;
         Self::populate_untracked(
             conn,
             now,
             gctx,
             GIT_DB_TABLE,
             "git_id",
             GIT_CO_TABLE,
             &base.git_co,
             sync_size,
         )?;

         // Update any NULL sizes if needed.
         if sync_size {
             Self::update_null_sizes(
                 conn,
                 gctx,
                 REGISTRY_INDEX_TABLE,
                 "registry_id",
                 REGISTRY_SRC_TABLE,
                 &base.src,
             )?;
             Self::update_null_sizes(
                 conn,
                 gctx,
                 GIT_DB_TABLE,
                 "git_id",
                 GIT_CO_TABLE,
                 &base.git_co,
             )?;
         }
         Ok(())
     }

     /// For parent tables, add any entries that are on disk but aren't tracked in the db.
     #[tracing::instrument(skip(conn, now, base_path))]
     fn update_parent_for_missing_from_db(
         conn: &Connection,
         now: Timestamp,
         parent_table_name: &str,
         base_path: &Path,
     ) -> CargoResult<()> {
         trace!(target: "gc", "checking for untracked parent to add to {parent_table_name}");
         let names = Self::names_from(base_path)?;

         let mut stmt = conn.prepare_cached(&format!(
             "INSERT INTO {parent_table_name} (name, timestamp)
                 VALUES (?1, ?2)
                 ON CONFLICT DO NOTHING",
         ))?;
         for name in names {
             stmt.execute(params![name, now])?;
         }
         Ok(())
     }

     /// Removes database entries for any files that are not on disk for the child tables.
     ///
     /// This could happen for example if the user manually deleted the file or
     /// any such scenario where the filesystem and db are out of sync.
     #[tracing::instrument(skip(conn, base_path))]
     fn update_db_for_removed(
         conn: &Connection,
         parent_table_name: &str,
         id_column_name: &str,
         table_name: &str,
         base_path: &Path,
     ) -> CargoResult<()> {
         trace!(target: "gc", "checking for db entries to remove from {table_name}");
         let mut select_stmt = conn.prepare_cached(&format!(
             "SELECT {table_name}.rowid, {parent_table_name}.name, {table_name}.name
              FROM {parent_table_name}, {table_name}
              WHERE {table_name}.{id_column_name} = {parent_table_name}.id",
         ))?;
         let mut delete_stmt =
             conn.prepare_cached(&format!("DELETE FROM {table_name} WHERE rowid = ?1"))?;
         let mut rows = select_stmt.query([])?;
         while let Some(row) = rows.next()? {
             let rowid: i64 = row.get_unwrap(0);
             let id_name: String = row.get_unwrap(1);
             let name: String = row.get_unwrap(2);
             if !base_path.join(id_name).join(name).exists() {
                 delete_stmt.execute([rowid])?;
             }
         }
         Ok(())
     }

     /// Removes database entries for any files that are not on disk for the parent tables.
     #[tracing::instrument(skip(conn, base_path, child_base_paths, delete_paths))]
     fn update_db_parent_for_removed_from_disk(
         conn: &Connection,
         parent_table_name: &str,
         base_path: &Path,
         child_base_paths: &[&Path],
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         trace!(target: "gc", "checking for db entries to remove from {parent_table_name}");
         let mut select_stmt =
             conn.prepare_cached(&format!("SELECT rowid, name FROM {parent_table_name}"))?;
         let mut delete_stmt =
             conn.prepare_cached(&format!("DELETE FROM {parent_table_name} WHERE rowid = ?1"))?;
         let mut rows = select_stmt.query([])?;
         while let Some(row) = rows.next()? {
             let rowid: i64 = row.get_unwrap(0);
             let id_name: String = row.get_unwrap(1);
             if !base_path.join(&id_name).exists() {
                 delete_stmt.execute([rowid])?;
                 // Make sure any child data is also cleaned up.
                 for child_base in child_base_paths {
                     let child_path = child_base.join(&id_name);
                     if child_path.exists() {
                         debug!(target: "gc", "removing orphaned path {child_path:?}");
                         delete_paths.push(child_path);
                     }
                 }
             }
         }
         Ok(())
     }

     /// Updates the database to add any `.crate` files that are currently
     /// not tracked (such as when they are downloaded by an older version of
     /// cargo).
     #[tracing::instrument(skip(conn, now, base_path))]
     fn populate_untracked_crate(
         conn: &Connection,
         now: Timestamp,
         base_path: &Path,
     ) -> CargoResult<()> {
         trace!(target: "gc", "populating untracked crate files");
         let mut insert_stmt = conn.prepare_cached(
             "INSERT INTO registry_crate (registry_id, name, size, timestamp)
              VALUES (?1, ?2, ?3, ?4)
              ON CONFLICT DO NOTHING",
         )?;
         let index_names = Self::names_from(&base_path)?;
         for index_name in index_names {
             let Some(id) = Self::id_from_name(conn, REGISTRY_INDEX_TABLE, &index_name)? else {
                 // The id is missing from the database. This should be resolved
                 // via update_db_parent_for_removed_from_disk.
                 continue;
             };
             let index_path = base_path.join(index_name);
             for crate_name in Self::names_from(&index_path)? {
                 if crate_name.ends_with(".crate") {
                     // Missing files should have already been taken care of by
                     // update_db_for_removed.
                     let size = paths::metadata(index_path.join(&crate_name))?.len();
                     insert_stmt.execute(params![id, crate_name, size, now])?;
                 }
             }
         }
         Ok(())
     }

     /// Updates the database to add any files that are currently not tracked
     /// (such as when they are downloaded by an older version of cargo).
     #[tracing::instrument(skip(conn, now, gctx, base_path, populate_size))]
     fn populate_untracked(
         conn: &Connection,
         now: Timestamp,
         gctx: &GlobalContext,
         id_table_name: &str,
         id_column_name: &str,
         table_name: &str,
         base_path: &Path,
         populate_size: bool,
     ) -> CargoResult<()> {
         trace!(target: "gc", "populating untracked files for {table_name}");
         // Gather names (and make sure they are in the database).
         let id_names = Self::names_from(&base_path)?;

         // This SELECT is used to determine if the directory is already
         // tracked. We don't want to do the expensive size computation unless
         // necessary.
         let mut select_stmt = conn.prepare_cached(&format!(
             "SELECT 1 FROM {table_name}
              WHERE {id_column_name} = ?1 AND name = ?2",
         ))?;
         let mut insert_stmt = conn.prepare_cached(&format!(
             "INSERT INTO {table_name} ({id_column_name}, name, size, timestamp)
              VALUES (?1, ?2, ?3, ?4)
              ON CONFLICT DO NOTHING",
         ))?;
         let mut progress = Progress::with_style("Scanning", ProgressStyle::Ratio, gctx);
         // Compute the size of any directory not in the database.
         for id_name in id_names {
             let Some(id) = Self::id_from_name(conn, id_table_name, &id_name)? else {
                 // The id is missing from the database. This should be resolved
                 // via update_db_parent_for_removed_from_disk.
                 continue;
             };
             let index_path = base_path.join(id_name);
             let names = Self::names_from(&index_path)?;
             let max = names.len();
             for (i, name) in names.iter().enumerate() {
                 if select_stmt.exists(params![id, name])? {
                     continue;
                 }
                 let dir_path = index_path.join(name);
                 if !dir_path.is_dir() {
                     continue;
                 }
                 progress.tick(i, max, "")?;
                 let size = if populate_size {
                     Some(du(&dir_path, table_name)?)
                 } else {
                     None
                 };
                 insert_stmt.execute(params![id, name, size, now])?;
             }
         }
         Ok(())
     }

     /// Fills in the `size` column where it is NULL.
     ///
     /// This can happen when something is added to disk by an older version of
     /// cargo, and one of the mark functions marked it without knowing the
     /// size.
     ///
     /// `update_db_for_removed` should be called before this is called.
     #[tracing::instrument(skip(conn, gctx, base_path))]
     fn update_null_sizes(
         conn: &Connection,
         gctx: &GlobalContext,
         parent_table_name: &str,
         id_column_name: &str,
         table_name: &str,
         base_path: &Path,
     ) -> CargoResult<()> {
         trace!(target: "gc", "updating NULL size information in {table_name}");
         let mut null_stmt = conn.prepare_cached(&format!(
             "SELECT {table_name}.rowid, {table_name}.name, {parent_table_name}.name
              FROM {table_name}, {parent_table_name}
              WHERE {table_name}.size IS NULL AND {table_name}.{id_column_name} = {parent_table_name}.id",
         ))?;
         let mut update_stmt = conn.prepare_cached(&format!(
             "UPDATE {table_name} SET size = ?1 WHERE rowid = ?2"
         ))?;
         let mut progress = Progress::with_style("Scanning", ProgressStyle::Ratio, gctx);
         let rows: Vec<_> = null_stmt
             .query_map([], |row| {
                 Ok((row.get_unwrap(0), row.get_unwrap(1), row.get_unwrap(2)))
             })?
             .collect();
         let max = rows.len();
         for (i, row) in rows.into_iter().enumerate() {
             let (rowid, name, id_name): (i64, String, String) = row?;
             let path = base_path.join(id_name).join(name);
             progress.tick(i, max, "")?;
             // Missing files should have already been taken care of by
             // update_db_for_removed.
             let size = du(&path, table_name)?;
             update_stmt.execute(params![size, rowid])?;
         }
         Ok(())
     }

     /// Adds paths to delete from either registry_crate or registry_src whose
     /// last use is older than the given timestamp.
     fn get_registry_items_to_clean_age(
         conn: &Connection,
         max_age: Timestamp,
         table_name: &str,
         base_path: &Path,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning {table_name} since {max_age:?}");
         let mut stmt = conn.prepare_cached(&format!(
             "DELETE FROM {table_name} WHERE timestamp < ?1
                 RETURNING registry_id, name"
         ))?;
         let rows = stmt
             .query_map(params![max_age], |row| {
                 let registry_id = row.get_unwrap(0);
                 let name: String = row.get_unwrap(1);
                 Ok((registry_id, name))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         let ids: Vec<_> = rows.iter().map(|r| r.0).collect();
         let id_map = Self::get_id_map(conn, REGISTRY_INDEX_TABLE, &ids)?;
         for (id, name) in rows {
             let encoded_registry_name = &id_map[&id];
             delete_paths.push(base_path.join(encoded_registry_name).join(name));
         }
         Ok(())
     }

     /// Adds paths to delete from either `registry_crate` or `registry_src` in
     /// order to keep the total size under the given max size.
     fn get_registry_items_to_clean_size(
         conn: &Connection,
         max_size: u64,
         table_name: &str,
         base_path: &Path,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning {table_name} till under {max_size:?}");
         let total_size: u64 = conn.query_row(
             &format!("SELECT coalesce(SUM(size), 0) FROM {table_name}"),
             [],
             |row| row.get(0),
         )?;
         if total_size <= max_size {
             return Ok(());
         }
         // This SQL statement selects all of the rows ordered by timestamp,
         // and then uses a window function to keep a running total of the
         // size. It selects all rows until the running total exceeds the
         // threshold of the total number of bytes that we want to delete.
         //
         // The window function essentially computes an aggregate over all
         // previous rows as it goes along. As long as the running size is
         // below the total amount that we need to delete, it keeps picking
         // more rows.
         //
         // The ORDER BY includes `name` mainly for test purposes so that
         // entries with the same timestamp have deterministic behavior.
         //
         // The coalesce helps convert NULL to 0.
         let mut stmt = conn.prepare(&format!(
             "DELETE FROM {table_name} WHERE rowid IN \
                 (SELECT x.rowid FROM \
                     (SELECT rowid, size, SUM(size) OVER \
                         (ORDER BY timestamp, name ROWS UNBOUNDED PRECEDING) AS running_amount \
                         FROM {table_name}) x \
                     WHERE coalesce(x.running_amount, 0) - x.size < ?1) \
                 RETURNING registry_id, name;"
         ))?;
         let rows = stmt
             .query_map(params![total_size - max_size], |row| {
                 let id = row.get_unwrap(0);
                 let name: String = row.get_unwrap(1);
                 Ok((id, name))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         // Convert registry_id to the encoded registry name, and join those.
         let ids: Vec<_> = rows.iter().map(|r| r.0).collect();
         let id_map = Self::get_id_map(conn, REGISTRY_INDEX_TABLE, &ids)?;
         for (id, name) in rows {
             let encoded_name = &id_map[&id];
             delete_paths.push(base_path.join(encoded_name).join(name));
         }
         Ok(())
     }

     /// Adds paths to delete from both `registry_crate` and `registry_src` in
     /// order to keep the total size under the given max size.
     fn get_registry_items_to_clean_size_both(
         conn: &Connection,
         max_size: u64,
         base: &BasePaths,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning download till under {max_size:?}");

         // This SQL statement selects from both registry_src and
         // registry_crate so that sorting of timestamps incorporates both of
         // them at the same time. It uses a const value of 1 or 2 as the first
         // column so that the code below can determine which table the value
         // came from.
         let mut stmt = conn.prepare_cached(
             "SELECT 1, registry_src.rowid, registry_src.name AS name, registry_index.name,
                     registry_src.size, registry_src.timestamp AS timestamp
              FROM registry_src, registry_index
              WHERE registry_src.registry_id = registry_index.id AND registry_src.size NOT NULL

              UNION

              SELECT 2, registry_crate.rowid, registry_crate.name AS name, registry_index.name,
                     registry_crate.size, registry_crate.timestamp AS timestamp
              FROM registry_crate, registry_index
              WHERE registry_crate.registry_id = registry_index.id

              ORDER BY timestamp, name",
         )?;
         let mut delete_src_stmt =
             conn.prepare_cached("DELETE FROM registry_src WHERE rowid = ?1")?;
         let mut delete_crate_stmt =
             conn.prepare_cached("DELETE FROM registry_crate WHERE rowid = ?1")?;
         let rows = stmt
             .query_map([], |row| {
                 Ok((
                     row.get_unwrap(0),
                     row.get_unwrap(1),
                     row.get_unwrap(2),
                     row.get_unwrap(3),
                     row.get_unwrap(4),
                 ))
             })?
             .collect::<Result<Vec<(i64, i64, String, String, u64)>, _>>()?;
         let mut total_size: u64 = rows.iter().map(|r| r.4).sum();
         debug!(target: "gc", "total download cache size appears to be {total_size}");
         for (table, rowid, name, index_name, size) in rows {
             if total_size <= max_size {
                 break;
             }
             if table == 1 {
                 delete_paths.push(base.src.join(index_name).join(name));
                 delete_src_stmt.execute([rowid])?;
             } else {
                 delete_paths.push(base.crate_dir.join(index_name).join(name));
                 delete_crate_stmt.execute([rowid])?;
             }
             // TODO: If delete crate, ensure src is also deleted.
             total_size -= size;
         }
         Ok(())
     }

     /// Adds paths to delete from the git cache, keeping the total size under
     /// the give value.
     ///
     /// Paths are relative to the `git` directory in the cache directory.
     fn get_git_items_to_clean_size(
         conn: &Connection,
         max_size: u64,
         base: &BasePaths,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning git till under {max_size:?}");

         // Collect all the sizes from git_db and git_checkouts, and then sort them by timestamp.
         let mut stmt = conn.prepare_cached("SELECT rowid, name, timestamp FROM git_db")?;
         let mut git_info = stmt
             .query_map([], |row| {
                 let rowid: i64 = row.get_unwrap(0);
                 let name: String = row.get_unwrap(1);
                 let timestamp: Timestamp = row.get_unwrap(2);
                 // Size is added below so that the error doesn't need to be
                 // converted to a rusqlite error.
                 Ok((timestamp, rowid, None, name, 0))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         for info in &mut git_info {
             let size = cargo_util::du(&base.git_db.join(&info.3), &[])?;
             info.4 = size;
         }

         let mut stmt = conn.prepare_cached(
             "SELECT git_checkout.rowid, git_db.name, git_checkout.name,
                 git_checkout.size, git_checkout.timestamp
                 FROM git_checkout, git_db
                 WHERE git_checkout.git_id = git_db.id AND git_checkout.size NOT NULL",
         )?;
         let git_co_rows = stmt
             .query_map([], |row| {
                 let rowid = row.get_unwrap(0);
                 let db_name: String = row.get_unwrap(1);
                 let name = row.get_unwrap(2);
                 let size = row.get_unwrap(3);
                 let timestamp = row.get_unwrap(4);
                 Ok((timestamp, rowid, Some(db_name), name, size))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         git_info.extend(git_co_rows);

         // Sort by timestamp, and name. The name is included mostly for test
         // purposes so that entries with the same timestamp have deterministic
         // behavior.
         git_info.sort_by(|a, b| (b.0, &b.3).cmp(&(a.0, &a.3)));

         // Collect paths to delete.
         let mut delete_db_stmt = conn.prepare_cached("DELETE FROM git_db WHERE rowid = ?1")?;
         let mut delete_co_stmt =
             conn.prepare_cached("DELETE FROM git_checkout WHERE rowid = ?1")?;
         let mut total_size: u64 = git_info.iter().map(|r| r.4).sum();
         debug!(target: "gc", "total git cache size appears to be {total_size}");
         while let Some((_timestamp, rowid, db_name, name, size)) = git_info.pop() {
             if total_size <= max_size {
                 break;
             }
             if let Some(db_name) = db_name {
                 delete_paths.push(base.git_co.join(db_name).join(name));
                 delete_co_stmt.execute([rowid])?;
                 total_size -= size;
             } else {
                 total_size -= size;
                 delete_paths.push(base.git_db.join(&name));
                 delete_db_stmt.execute([rowid])?;
                 // If the db is deleted, then all the checkouts must be deleted.
                 let mut i = 0;
                 while i < git_info.len() {
                     if git_info[i].2.as_deref() == Some(name.as_ref()) {
                         let (_, rowid, db_name, name, size) = git_info.remove(i);
                         delete_paths.push(base.git_co.join(db_name.unwrap()).join(name));
                         delete_co_stmt.execute([rowid])?;
                         total_size -= size;
                     } else {
                         i += 1;
                     }
                 }
             }
         }
         Ok(())
     }

     /// Adds paths to delete from `registry_index` whose last use is older
     /// than the given timestamp.
     fn get_registry_index_to_clean(
         conn: &Connection,
         max_age: Timestamp,
         base: &BasePaths,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning index since {max_age:?}");
         let mut stmt = conn.prepare_cached(
             "DELETE FROM registry_index WHERE timestamp < ?1
                 RETURNING name",
         )?;
         let mut rows = stmt.query([max_age])?;
         while let Some(row) = rows.next()? {
             let name: String = row.get_unwrap(0);
             delete_paths.push(base.index.join(&name));
             // Also delete .crate and src directories, since by definition
             // they cannot be used without their index.
             delete_paths.push(base.src.join(&name));
             delete_paths.push(base.crate_dir.join(&name));
         }
         Ok(())
     }

     /// Adds paths to delete from `git_checkout` whose last use is
     /// older than the given timestamp.
     fn get_git_co_items_to_clean(
         conn: &Connection,
         max_age: Timestamp,
         base_path: &Path,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning git co since {max_age:?}");
         let mut stmt = conn.prepare_cached(
             "DELETE FROM git_checkout WHERE timestamp < ?1
                 RETURNING git_id, name",
         )?;
         let rows = stmt
             .query_map(params![max_age], |row| {
                 let git_id = row.get_unwrap(0);
                 let name: String = row.get_unwrap(1);
                 Ok((git_id, name))
             })?
             .collect::<Result<Vec<_>, _>>()?;
         let ids: Vec<_> = rows.iter().map(|r| r.0).collect();
         let id_map = Self::get_id_map(conn, GIT_DB_TABLE, &ids)?;
         for (id, name) in rows {
             let encoded_git_name = &id_map[&id];
             delete_paths.push(base_path.join(encoded_git_name).join(name));
         }
         Ok(())
     }

     /// Adds paths to delete from `git_db` in order to keep the total size
     /// under the given max size.
     fn get_git_db_items_to_clean(
         conn: &Connection,
         max_age: Timestamp,
         base: &BasePaths,
         delete_paths: &mut Vec<PathBuf>,
     ) -> CargoResult<()> {
         debug!(target: "gc", "cleaning git db since {max_age:?}");
         let mut stmt = conn.prepare_cached(
             "DELETE FROM git_db WHERE timestamp < ?1
                 RETURNING name",
         )?;
         let mut rows = stmt.query([max_age])?;
         while let Some(row) = rows.next()? {
             let name: String = row.get_unwrap(0);
             delete_paths.push(base.git_db.join(&name));
             // Also delete checkout directories, since by definition they
             // cannot be used without their db.
             delete_paths.push(base.git_co.join(&name));
         }
         Ok(())
     }
 }

 /// Helper to generate the upsert for the parent tables.
 ///
 /// This handles checking if the row already exists, and only updates the
 /// timestamp it if it hasn't been updated recently. This also handles keeping
 /// a cached map of the `id` value.
 ///
 /// Unfortunately it is a bit tricky to share this code without a macro.
 macro_rules! insert_or_update_parent {
     ($self:expr, $conn:expr, $table_name:expr, $timestamps_field:ident, $keys_field:ident, $encoded_name:ident) => {
         let mut select_stmt = $conn.prepare_cached(concat!(
             "SELECT id, timestamp FROM ",
             $table_name,
             " WHERE name = ?1"
         ))?;
         let mut insert_stmt = $conn.prepare_cached(concat!(
             "INSERT INTO ",
             $table_name,
             " (name, timestamp)
                 VALUES (?1, ?2)
                 ON CONFLICT DO UPDATE SET timestamp=excluded.timestamp
                 RETURNING id",
         ))?;
         let mut update_stmt = $conn.prepare_cached(concat!(
             "UPDATE ",
             $table_name,
             " SET timestamp = ?1 WHERE id = ?2"
         ))?;
         for (parent, new_timestamp) in std::mem::take(&mut $self.$timestamps_field) {
             trace!(target: "gc",
                 concat!("insert ", $table_name, " {:?} {}"),
                 parent,
                 new_timestamp
             );
             let mut rows = select_stmt.query([parent.$encoded_name])?;
             let id = if let Some(row) = rows.next()? {
                 let id: ParentId = row.get_unwrap(0);
                 let timestamp: Timestamp = row.get_unwrap(1);
                 if timestamp < new_timestamp - UPDATE_RESOLUTION {
                     update_stmt.execute(params![new_timestamp, id])?;
                 }
                 id
             } else {
                 insert_stmt.query_row(params![parent.$encoded_name, new_timestamp], |row| {
                     row.get(0)
                 })?
             };
             match $self.$keys_field.entry(parent.$encoded_name) {
                 hash_map::Entry::Occupied(o) => {
                     assert_eq!(*o.get(), id);
                 }
                 hash_map::Entry::Vacant(v) => {
                     v.insert(id);
                 }
             }
         }
         return Ok(());
     };
 }

 /// This is a cache of modifications that will be saved to disk all at once
 /// via the [`DeferredGlobalLastUse::save`] method.
 ///
 /// This is here to improve performance.
 #[derive(Debug)]
 pub struct DeferredGlobalLastUse {
     /// Cache of registry keys, used for faster fetching.
     ///
     /// The key is the registry name (which is its directory name) and the
     /// value is the `id` in the `registry_index` table.
     registry_keys: HashMap<InternedString, ParentId>,
     /// Cache of git keys, used for faster fetching.
     ///
     /// The key is the git db name (which is its directory name) and the value
     /// is the `id` in the `git_db` table.
     git_keys: HashMap<InternedString, ParentId>,

     /// New registry index entries to insert.
     registry_index_timestamps: HashMap<RegistryIndex, Timestamp>,
     /// New registry `.crate` entries to insert.
     registry_crate_timestamps: HashMap<RegistryCrate, Timestamp>,
     /// New registry src directory entries to insert.
     registry_src_timestamps: HashMap<RegistrySrc, Timestamp>,
     /// New git db entries to insert.
     git_db_timestamps: HashMap<GitDb, Timestamp>,
     /// New git checkout entries to insert.
     git_checkout_timestamps: HashMap<GitCheckout, Timestamp>,
     /// This is used so that a warning about failing to update the database is
     /// only displayed once.
     save_err_has_warned: bool,
     /// The current time, used to improve performance to avoid accessing the
     /// clock hundreds of times.
     now: Timestamp,
 }

 impl DeferredGlobalLastUse {
     pub fn new() -> DeferredGlobalLastUse {
         DeferredGlobalLastUse {
             registry_keys: HashMap::new(),
             git_keys: HashMap::new(),
             registry_index_timestamps: HashMap::new(),
             registry_crate_timestamps: HashMap::new(),
             registry_src_timestamps: HashMap::new(),
             git_db_timestamps: HashMap::new(),
             git_checkout_timestamps: HashMap::new(),
             save_err_has_warned: false,
             now: now(),
         }
     }

     pub fn is_empty(&self) -> bool {
         self.registry_index_timestamps.is_empty()
             && self.registry_crate_timestamps.is_empty()
             && self.registry_src_timestamps.is_empty()
             && self.git_db_timestamps.is_empty()
             && self.git_checkout_timestamps.is_empty()
     }

     fn clear(&mut self) {
         self.registry_index_timestamps.clear();
         self.registry_crate_timestamps.clear();
         self.registry_src_timestamps.clear();
         self.git_db_timestamps.clear();
         self.git_checkout_timestamps.clear();
     }

     /// Indicates the given [`RegistryIndex`] has been used right now.
     pub fn mark_registry_index_used(&mut self, registry_index: RegistryIndex) {
         self.mark_registry_index_used_stamp(registry_index, None);
     }

     /// Indicates the given [`RegistryCrate`] has been used right now.
     ///
     /// Also implicitly marks the index used, too.
     pub fn mark_registry_crate_used(&mut self, registry_crate: RegistryCrate) {
         self.mark_registry_crate_used_stamp(registry_crate, None);
     }

     /// Indicates the given [`RegistrySrc`] has been used right now.
     ///
     /// Also implicitly marks the index used, too.
     pub fn mark_registry_src_used(&mut self, registry_src: RegistrySrc) {
         self.mark_registry_src_used_stamp(registry_src, None);
     }

     /// Indicates the given [`GitCheckout`] has been used right now.
     ///
     /// Also implicitly marks the git db used, too.
     pub fn mark_git_checkout_used(&mut self, git_checkout: GitCheckout) {
         self.mark_git_checkout_used_stamp(git_checkout, None);
     }

     /// Indicates the given [`RegistryIndex`] has been used with the given
     /// time (or "now" if `None`).
     pub fn mark_registry_index_used_stamp(
         &mut self,
         registry_index: RegistryIndex,
         timestamp: Option<&SystemTime>,
     ) {
         let timestamp = timestamp.map_or(self.now, to_timestamp);
         self.registry_index_timestamps
             .insert(registry_index, timestamp);
     }

     /// Indicates the given [`RegistryCrate`] has been used with the given
     /// time (or "now" if `None`).
     ///
     /// Also implicitly marks the index used, too.
     pub fn mark_registry_crate_used_stamp(
         &mut self,
         registry_crate: RegistryCrate,
         timestamp: Option<&SystemTime>,
     ) {
         let timestamp = timestamp.map_or(self.now, to_timestamp);
         let index = RegistryIndex {
             encoded_registry_name: registry_crate.encoded_registry_name,
         };
         self.registry_index_timestamps.insert(index, timestamp);
         self.registry_crate_timestamps
             .insert(registry_crate, timestamp);
     }

     /// Indicates the given [`RegistrySrc`] has been used with the given
     /// time (or "now" if `None`).
     ///
     /// Also implicitly marks the index used, too.
     pub fn mark_registry_src_used_stamp(
         &mut self,
         registry_src: RegistrySrc,
         timestamp: Option<&SystemTime>,
     ) {
         let timestamp = timestamp.map_or(self.now, to_timestamp);
         let index = RegistryIndex {
             encoded_registry_name: registry_src.encoded_registry_name,
         };
         self.registry_index_timestamps.insert(index, timestamp);
         self.registry_src_timestamps.insert(registry_src, timestamp);
     }

     /// Indicates the given [`GitCheckout`] has been used with the given
     /// time (or "now" if `None`).
     ///
     /// Also implicitly marks the git db used, too.
     pub fn mark_git_checkout_used_stamp(
         &mut self,
         git_checkout: GitCheckout,
         timestamp: Option<&SystemTime>,
     ) {
         let timestamp = timestamp.map_or(self.now, to_timestamp);
         let db = GitDb {
             encoded_git_name: git_checkout.encoded_git_name,
         };
         self.git_db_timestamps.insert(db, timestamp);
         self.git_checkout_timestamps.insert(git_checkout, timestamp);
     }

     /// Saves all of the deferred information to the database.
     ///
     /// This will also clear the state of `self`.
     #[tracing::instrument(skip_all)]
     pub fn save(&mut self, tracker: &mut GlobalCacheTracker) -> CargoResult<()> {
         trace!(target: "gc", "saving last-use data");
         if self.is_empty() {
             return Ok(());
         }
         let tx = tracker.conn.transaction()?;
         // These must run before the ones that refer to their IDs.
         self.insert_registry_index_from_cache(&tx)?;
         self.insert_git_db_from_cache(&tx)?;
         self.insert_registry_crate_from_cache(&tx)?;
         self.insert_registry_src_from_cache(&tx)?;
         self.insert_git_checkout_from_cache(&tx)?;
         tx.commit()?;
         trace!(target: "gc", "last-use save complete");
         Ok(())
     }

     /// Variant of [`DeferredGlobalLastUse::save`] that does not return an
     /// error.
     ///
     /// This will log or display a warning to the user.
     pub fn save_no_error(&mut self, gctx: &GlobalContext) {
         if let Err(e) = self.save_with_gctx(gctx) {
             // Because there is an assertion in auto-gc that checks if this is
             // empty, be sure to clear it so that assertion doesn't fail.
             self.clear();
             if !self.save_err_has_warned {
                 if is_silent_error(&e) && gctx.shell().verbosity() != Verbosity::Verbose {
                     tracing::warn!("failed to save last-use data: {e:?}");
                 } else {
                     crate::display_warning_with_error(
                         "failed to save last-use data\n\
                         This may prevent cargo from accurately tracking what is being \
                         used in its global cache. This information is used for \
                         automatically removing unused data in the cache.",
                         &e,
                         &mut gctx.shell(),
                     );
                     self.save_err_has_warned = true;
                 }
             }
         }
     }

     fn save_with_gctx(&mut self, gctx: &GlobalContext) -> CargoResult<()> {
         let mut tracker = gctx.global_cache_tracker()?;
         self.save(&mut tracker)
     }

     /// Flushes all of the `registry_index_timestamps` to the database,
     /// clearing `registry_index_timestamps`.
     fn insert_registry_index_from_cache(&mut self, conn: &Connection) -> CargoResult<()> {
         insert_or_update_parent!(
             self,
             conn,
             "registry_index",
             registry_index_timestamps,
             registry_keys,
             encoded_registry_name
         );
     }

     /// Flushes all of the `git_db_timestamps` to the database,
     /// clearing `registry_index_timestamps`.
     fn insert_git_db_from_cache(&mut self, conn: &Connection) -> CargoResult<()> {
         insert_or_update_parent!(
             self,
             conn,
             "git_db",
             git_db_timestamps,
             git_keys,
             encoded_git_name
         );
     }

     /// Flushes all of the `registry_crate_timestamps` to the database,
     /// clearing `registry_index_timestamps`.
     fn insert_registry_crate_from_cache(&mut self, conn: &Connection) -> CargoResult<()> {
         let registry_crate_timestamps = std::mem::take(&mut self.registry_crate_timestamps);
         for (registry_crate, timestamp) in registry_crate_timestamps {
             trace!(target: "gc", "insert registry crate {registry_crate:?} {timestamp}");
             let registry_id = self.registry_id(conn, registry_crate.encoded_registry_name)?;
             let mut stmt = conn.prepare_cached(
                 "INSERT INTO registry_crate (registry_id, name, size, timestamp)
                  VALUES (?1, ?2, ?3, ?4)
                  ON CONFLICT DO UPDATE SET timestamp=excluded.timestamp
                     WHERE timestamp < ?5
                  ",
             )?;
             stmt.execute(params![
                 registry_id,
                 registry_crate.crate_filename,
                 registry_crate.size,
                 timestamp,
                 timestamp - UPDATE_RESOLUTION
             ])?;
         }
         Ok(())
     }

     /// Flushes all of the `registry_src_timestamps` to the database,
     /// clearing `registry_index_timestamps`.
     fn insert_registry_src_from_cache(&mut self, conn: &Connection) -> CargoResult<()> {
         let registry_src_timestamps = std::mem::take(&mut self.registry_src_timestamps);
         for (registry_src, timestamp) in registry_src_timestamps {
             trace!(target: "gc", "insert registry src {registry_src:?} {timestamp}");
             let registry_id = self.registry_id(conn, registry_src.encoded_registry_name)?;
             let mut stmt = conn.prepare_cached(
                 "INSERT INTO registry_src (registry_id, name, size, timestamp)
                  VALUES (?1, ?2, ?3, ?4)
                  ON CONFLICT DO UPDATE SET timestamp=excluded.timestamp
                     WHERE timestamp < ?5
                  ",
             )?;
             stmt.execute(params![
                 registry_id,
                 registry_src.package_dir,
                 registry_src.size,
                 timestamp,
                 timestamp - UPDATE_RESOLUTION
             ])?;
         }

         Ok(())
     }

     /// Flushes all of the `git_checkout_timestamps` to the database,
     /// clearing `registry_index_timestamps`.
     fn insert_git_checkout_from_cache(&mut self, conn: &Connection) -> CargoResult<()> {
         let git_checkout_timestamps = std::mem::take(&mut self.git_checkout_timestamps);
         for (git_checkout, timestamp) in git_checkout_timestamps {
             let git_id = self.git_id(conn, git_checkout.encoded_git_name)?;
             let mut stmt = conn.prepare_cached(
                 "INSERT INTO git_checkout (git_id, name, size, timestamp)
                  VALUES (?1, ?2, ?3, ?4)
                  ON CONFLICT DO UPDATE SET timestamp=excluded.timestamp
                     WHERE timestamp < ?5",
             )?;
             stmt.execute(params![
                 git_id,
                 git_checkout.short_name,
                 git_checkout.size,
                 timestamp,
                 timestamp - UPDATE_RESOLUTION
             ])?;
         }

         Ok(())
     }

     /// Returns the numeric ID of the registry, either fetching from the local
     /// cache, or getting it from the database.
     ///
     /// It is an error if the registry does not exist.
     fn registry_id(
         &mut self,
         conn: &Connection,
         encoded_registry_name: InternedString,
     ) -> CargoResult<ParentId> {
         match self.registry_keys.get(&encoded_registry_name) {
             Some(i) => Ok(*i),
             None => {
                 let Some(id) = GlobalCacheTracker::id_from_name(
                     conn,
                     REGISTRY_INDEX_TABLE,
                     &encoded_registry_name,
                 )?
                 else {
                     bail!("expected registry_index {encoded_registry_name} to exist, but wasn't found");
                 };
                 self.registry_keys.insert(encoded_registry_name, id);
                 Ok(id)
             }
         }
     }

     /// Returns the numeric ID of the git db, either fetching from the local
     /// cache, or getting it from the database.
     ///
     /// It is an error if the git db does not exist.
     fn git_id(
         &mut self,
         conn: &Connection,
         encoded_git_name: InternedString,
     ) -> CargoResult<ParentId> {
         match self.git_keys.get(&encoded_git_name) {
             Some(i) => Ok(*i),
             None => {
                 let Some(id) =
                     GlobalCacheTracker::id_from_name(conn, GIT_DB_TABLE, &encoded_git_name)?
                 else {
                     bail!("expected git_db {encoded_git_name} to exist, but wasn't found")
                 };
                 self.git_keys.insert(encoded_git_name, id);
                 Ok(id)
             }
         }
     }
 }

 /// Converts a [`SystemTime`] to a [`Timestamp`] which can be stored in the database.
 fn to_timestamp(t: &SystemTime) -> Timestamp {
     t.duration_since(SystemTime::UNIX_EPOCH)
         .expect("invalid clock")
         .as_secs()
 }

 /// Returns the current time.
 ///
 /// This supports pretending that the time is different for testing using an
 /// environment variable.
 ///
 /// If possible, try to avoid calling this too often since accessing clocks
 /// can be a little slow on some systems.
 #[allow(clippy::disallowed_methods)]
 fn now() -> Timestamp {
     match std::env::var("__CARGO_TEST_LAST_USE_NOW") {
         Ok(now) => now.parse().unwrap(),
         Err(_) => to_timestamp(&SystemTime::now()),
     }
 }

 /// Returns whether or not the given error should cause a warning to be
 /// displayed to the user.
 ///
 /// In some situations, like a read-only global cache, we don't want to spam
 /// the user with a warning. I think once cargo has controllable lints, I
 /// think we should consider changing this to always warn, but give the user
 /// an option to silence the warning.
 pub fn is_silent_error(e: &anyhow::Error) -> bool {
     if let Some(e) = e.downcast_ref::<rusqlite::Error>() {
         if matches!(
             e.sqlite_error_code(),
             Some(ErrorCode::CannotOpen | ErrorCode::ReadOnly)
         ) {
             return true;
         }
     }
     false
 }

 /// Returns the disk usage for a git checkout directory.
 pub fn du_git_checkout(path: &Path) -> CargoResult<u64> {
     // !.git is used because clones typically use hardlinks for the git
     // contents. TODO: Verify behavior on Windows.
     // TODO: Or even better, switch to worktrees, and remove this.
     cargo_util::du(&path, &["!.git"])
 }

 fn du(path: &Path, table_name: &str) -> CargoResult<u64> {
     if table_name == GIT_CO_TABLE {
         du_git_checkout(path)
     } else {
         cargo_util::du(&path, &[])
     }
 }