Skip to content

Compaction deletes non-shared files#6060

Open
ArbaazKhan1 wants to merge 10 commits into
apache:mainfrom
ArbaazKhan1:accumulo-5387
Open

Compaction deletes non-shared files#6060
ArbaazKhan1 wants to merge 10 commits into
apache:mainfrom
ArbaazKhan1:accumulo-5387

Conversation

@ArbaazKhan1

Copy link
Copy Markdown
Contributor

Closes Issue #5387

Added shared marker to DataFileValue to track files used by single or multiple tablets. Compactions now delete non-shared files directly, instead of creating GC markers. This Pr adds the following changes:

  • Added Shared field to DataFileValue
  • Compactions attempt deletion for non-shared files
  • Clone, split and bulk import now set shared markers as needed

@keith-turner keith-turner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also need to modify the split code.

// File is not shared - try to delete directly
try {
Path filePath = new Path(file.getMetadataPath());
boolean deleted = ctx.getVolumeManager().deleteRecursively(filePath);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting files here is not correct because the conditional mutation to the metadata table has not yet been made. Only after successfully making the conditional mutation will we know what was shared or not. Need to do something like the following.

  1. Modify conditional mutation to check that the file values (this checks the new shared field) are the same for the tablet. This atomically ensures that what is thought to be shared is actually shared when the metadata table update happens. Currently the conditional mutation only checks the set of files, not their values.
  2. After successfully submitting the condtional mutation look at the table metadata and the list of files compacting (from commitData) . Compute a set of files that is in commitData.inputPaths and is marked as not shared in the tablet metadata. If a file is not in tablet metadata, add it to the set because its shared status is unknown (this can happen because of failure and retry). This is the set of compacting files that are known for sure to not be shared, in the case of failure and retry this set should be empty.
  3. If the conditional mutation was not successful, this set above should be empty.
  4. Pass the set computed above to PutGcCandidates
  5. In PutGcCandidates delete non shared files and add gc delete markers for shared files. Doing the actual file deletes in PutGcCandidates makes reasoning about retries (caused by process dying in middle of fate step) much easier.

There is existing code for comparing the files values, but its not exactly what we need. Would probably need to create a new requireFiles(Map<StoredTabletFile, DataFileValue> files) method that uses code similar to this in its impl.

}
}

markSourceFilesAsShared(srcTableId, tableId, context, bw);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking the source tablets as shared here has race conditions, these may not be the files in the clone. Also marking them as shared must be done using a conditional writer that only changes the marking if the file exists in the tablet w/ the same value (this check must be done by the conditional mutation).

The following conditions must be true for clone to avoid race conditions.

  1. Files are marked as shared before cloning.
  2. After copying the tablets file they must still exist in the source tablet.

Need to rework the code to do the following algorithm for each tablet.

  1. Before cloning mark all files in the source tablet as shared using a conditional mutation.
  2. Copy the source tablets files to the destination tablet.
  3. Read the source and desitation tablets and ensure the destitnations tablet has a subset of the sources files. If so this tablet was successfully cloned and we know the GC did not delete any files because the source still has them. Done with tablet and do not need to next step.
  4. There was a concurrent change w/ the source tablets files, possible that GC even deleted some of them. Delete the destination tablet and go back to step 1.

The current code does this w/o step 1. Also the current code batches tablet read/writes for efficiency. The code would probably be much easier to understand w/o the batching, but doing one tablet at a time w/o batching would be really slow. So need to modify the code work in step 1 and still do the batching.

@dlmarion

Copy link
Copy Markdown
Contributor

@ArbaazKhan1 - looks like this has a conflict now.

@ctubbsii ctubbsii added this to the 4.0.0 milestone Jun 18, 2026
time = Long.parseLong(ba[2]);
} else {
time = -1;
// Could be old format with time, or new format with shared

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should always encode the time. If there are 3 parts, then the DataFileValue was written before this code and shared is false. If there are 4 parts, then it was written with the new code.

@Override
public int hashCode() {
return Long.valueOf(size + numEntries).hashCode();
return Long.valueOf(size + numEntries + (shared ? 1 : 0)).hashCode();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to change this to Objects.hash(size, numEntries, shared)

Comment on lines +218 to +220
"Conditional mutation to mark files as shared was rejected for tablet {}, "
+ "tablet changed concurrently, clone will retry for this tablet",
srcTablet.getExtent());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to return a boolean here for success/fail (see comment below). Could probably move the debug log statement down to the calling location too.

try {
try (TabletsMetadata tabletsMetadata = createCloneScanner(null, srcTableId, context)) {
for (TabletMetadata tablet : tabletsMetadata) {
markTabletFilesAsShared(context, tablet);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a failure here, then you probably want a continue to restart the loop. I don't think you want to continue on a failure.

var tabletFilesMap = tablet.getFilesMap();
for (StoredTabletFile file : ecm.getJobFiles()) {
DataFileValue dfv = tabletFilesMap.get(file);
if (dfv == null || !dfv.isShared()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think dfv should not be null. If it is, that seems like a larger issue. I don't think we should delete these files.

var tabletMutator = tabletsMutator.mutateTablet(getExtent()).requireAbsentOperation()
.requireCompaction(ecid).requireSame(tablet, LOCATION)
.requireFiles(commitData.getJobFiles());
.requireFiles(tablet.getFilesMap());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code before this change was validating that the files that are part of the compaction are part of the tablet. I think that should always be true. This change is checking that the tablets files are the same as when this method started. This might not be true as a compaction could have occurred adding a new file.


public PutGcCandidates(CompactionCommitData commitData, String refreshLocation) {
public PutGcCandidates(CompactionCommitData commitData, String refreshLocation,
ArrayList<StoredTabletFile> filesToDeleteViaGc) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, the filesToDeleteViaGc list is the list of non-shared files that we want to delete directly, not via the Garbage Collector. If that's the case, then I would suggest renaming this variable in all places to avoid confusion.

LOG.debug("Directly deleted non-shared compaction input file: {}",
jobFile.getFileName());
} else {
LOG.debug("Non-shared file {} was already absent (likely retry), no GC marker needed",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be other cases where this returns false. I think you should add the file to the fallback set here to create the delete marker.

Comment on lines +113 to +118
// Compute which files appear in more than one tablet range
Set<String> sharedFiles;
try (LoadMappingIterator lmiPass1 =
BulkSerialize.getUpdatedLoadMapping(bulkDir.toString(), bulkInfo.tableId, fs::open)) {
sharedFiles = computeSharedFiles(fateId, lmiPass1);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why this is here, it's too bad that we have to read the underlying files twice.

@dlmarion

dlmarion commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I didn't look at the tests yet. You have covered compactions, split, and bulk import. For merge, I guess the resulting tablet just contains all of the file references from the merged tablets? I'm just trying to determine if there is any chance of a collision on files with different shared values and how that's handled. It might be useful to see if you can add a test for this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow compactions to delete non shared input files.

4 participants