Memory Mapped file segment / always appending

Tue Jul 28 16:55:05 UTC 2020

I wonder if I should allocate just a few MB, each time I'm appending at the
end of the file (that is the memory segment). But that would mean, that I'd
have to close and recreate the mapped memory segment much more often.

I think I have to share the reader-writer for the mapped memory segment for
all transactions opened (only one open read-write transaction is allowed
per resource). Thus, for very short write-transactions I do not have to
truncate all the time, maybe only if the resource manager, which is
basically a resource session and opens transactions closes, the writer
should be closed as well.

A simple test is something like that (not used JMH):

final var resource = "smallInsertions";

try (final var database = JsonTestHelper.getDatabase(PATHS.PATH1.getFile())) {
  database.createResource(ResourceConfiguration.newBuilder(resource)
                                               .storeDiffs(false)
                                               .hashKind(HashType.NONE)
                                               .build());
  try (final var manager = database.openResourceManager(resource);
final var wtx = manager.beginNodeTrx()) {
    System.out.println("Start inserting");

    final long time = System.nanoTime();

    wtx.insertArrayAsFirstChild();

    var jsonObject = """
        {"item":"this is item 0", "package":"package", "kg":5}
        """.strip();

    wtx.insertSubtreeAsFirstChild(JsonShredder.createStringReader(jsonObject));

    for (int i = 0; i < 650_000; i++) {
      jsonObject = """
          {"item":"this is item %s", "package":"package", "kg":5}
          """.strip().formatted(i);

      wtx.insertSubtreeAsRightSibling(JsonShredder.createStringReader(jsonObject));
    }

    System.out.println("Done inserting [" + (System.nanoTime() - time)
/ 1_000_000 + "ms].");
  }
}

insertSubtreeAsFirstChild(...) internally committs the transaction and
thus 650_000 times mmap / munmap + truncate is called. That of course
wasn't a problem with a RandomAccessFile implementation.

Regardless of this issue, I wonder what's a good size to map. Maybe
Integer.MAX_VALUE is too much, but it also depends on the insertion
rate.

Kind regards

Johannes

Am Mo., 27. Juli 2020 um 19:48 Uhr schrieb Maurizio Cimadamore <
maurizio.cimadamore at oracle.com>:

>
> On 27/07/2020 18:33, Johannes Lichtenberger wrote:
>
> To be fair, I don't know how to truncate it directly with Panama.
>
> I'm currently using the FileChannel API just for truncating during closing:
>
> try (final FileChannel outChan = new FileOutputStream(dataFile.toFile(), true).getChannel()) {
>   outChan.truncate(dataSegmentFileSize);
> } catch (IOException e) {
>   throw new SirixIOException(e);
> }
>
>
> But I guess it's also more of a conceptual issue (or should truncating be
> pretty fast?).
>
> FileChannel usage is correct. Yes the flame graph seems to show that
> truncate takes quite a chunk of time, but I think that, unless you have
> some bug where you call it more times than really required, there's not
> much that can be done about it - e.g. I don't think that the performance
> issue you are seeing is caused by FileChannelImpl::truncate being slow.
>
> Maurizio
>
>
> I've committed the file:
> https://github.com/sirixdb/sirix/blob/master/Screenshot%20from%202020-07-27%2018-58-39.png
>
> kind regards
> Johannes
>
> Am Mo., 27. Juli 2020 um 19:18 Uhr schrieb Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com>:
>
>>
>> On 27/07/2020 18:12, Johannes Lichtenberger wrote:
>> > Hi,
>> >
>> > as I'm always appending data to the end of a file. I came up with the
>> idea
>> > to always map a segment in chunks of Integer.MAX, that is the segment
>> > always get's closed and a new segment gets created with twice the size
>> (for
>> > reading/writing). When I'm closing the writer I'm simply truncating the
>> > file(s) to the real length.
>> >
>> > However, I just saw that it's horribly slow for a lot of small writes.
>> That
>> > is I'm always creating a read/write page reader, and closing it again
>> after
>> > a write. However, truncating a lot of unused space, over and over again
>> is
>> > a real problem. I've attached a flame graph from YourKit showing that
>> half
>> > of the time over a large amount of CPU samples is spend with truncating.
>>
>> Is the time spent in the Panama API, or do you mean that the code is
>> slow just natively (e.g. at the mmap level)
>>
>> We can't see the attachment (the server truncates them :-) )
>>
>> Maurizio
>>
>> >
>> > Any ideas, how to improve performance?
>> >
>> > The packe is this and very small:
>> >
>> >
>> https://github.com/sirixdb/sirix/tree/master/bundles/sirix-core/src/main/java/org/sirix/io/memorymapped
>> >
>> > kind regards
>> > Johannes
>>
>