Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dinit-iostream: Custom new special-purpose library #263

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

mobin-2008
Copy link
Collaborator

@mobin-2008 mobin-2008 commented Dec 16, 2023

This is replacement for C++ standard library iostream/fstream.

See dinit-iostream.h for more information and the usage.

Closes #240

Signed-off-by: Mobin "Hojjat" Aydinfar <[email protected]>

@mobin-2008 mobin-2008 added Enhancement/New Feature Improving things or introduce new feature A-Importance: High labels Dec 16, 2023
src/dinit-iostream.cc Outdated Show resolved Hide resolved
@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 2 times, most recently from 85d018e to 60197e3 Compare December 17, 2023 11:03
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed mostly one file (dinit-iostream.h) and I have a lot of comments so I will stop here and let you address them before adding more.

A few general comments:

  • it seems to me like there's more in common in istream and ostream that could be moved into streambuf_base (and perhaps streambuf_base and fstream_base should be merged)
  • buffer handling seems to be suboptimal, for example the buffer isn't necessarily completely full before it is flushed. Eg. suppose I write 16380 bytes followed by 10 bytes followed by 512 bytes and then flush. That should need only two writes (buffer size is 16384) but it will do three.
  • "move" constructor doesn't make sense, I'm still waiting for explanation of why it's needed, and even if it is, why doesn't it transfer buffer ownership?
  • trying to write again after write failed previously is not a good idea, and trying to do a write of later data after earlier flush failed is very wrong - in the worst case it will skip the data from the buffer and still write the later data, so a chunk of written data will just be missing from the output.

src/dinit-iostream.cc Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/dinit-iostream.cc Outdated Show resolved Hide resolved
@davmac314
Copy link
Owner

davmac314 commented Dec 18, 2023

  • buffer handling seems to be suboptimal, for example the buffer isn't necessarily completely full before it is flushed. Eg. suppose I write 16380 bytes followed by 10 bytes followed by 512 bytes and then flush. That should need only two writes (buffer size is 16384) but it will do three.

Sorry, this example might not be right. Consider instead: 16380 bytes followed by 8 bytes followed by 16380 bytes again.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 3 times, most recently from 1b56d2b to 1749cb1 Compare December 24, 2023 11:00
@davmac314
Copy link
Owner

@mobin-2008 you made a comment here but it seems to have been deleted. Let me know when the PR is ready for review.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 4 times, most recently from 507ff03 to 20bc0e7 Compare December 28, 2023 18:55
@mobin-2008
Copy link
Collaborator Author

Ready for review.

Changelog from your last review:

  1. MAX_TYPE_NUMDIGITS() algorithm has been changed to recursive integer logarithm like function.
  2. buf->reset() has been removed (because make no difference)
  3. Documention has been reworded in terms of vocabulary and grammar.
  4. A check has been added to fstream_base::open() functions to disallow null pointer/empty paths.
  5. A check has been added to put() function to avoid undefined behavior in strlen() of null string and Segmentation fault on appending null string into buffer.
  6. Exception that get thrown by iostream library has been specialized.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 2 times, most recently from 605c1f9 to 0777709 Compare December 28, 2023 21:18
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you've made some changes unrelated to previous review, but haven't addressed all the issues raised in the previous review.

Please make sure to address all issues, and avoid making new changes that weren't discussed unless they solve important problems. Once issues from last review are resolved I will review again.

@davmac314
Copy link
Owner

davmac314 commented Dec 29, 2023

I will try to fully explain each issue, so hopefully there's no confusion:

Constructors setting buffer_failbit

The constructors should be properly documented (i.e. document that buffer allocation is done when the constructor is called, but may fail, leaving buffer_failbit set). I see now that you did improve the behaviour, but I didn't see that before because you didn't list it as one of the changes and you included many changes so there was a lot of noise.

(That is one reason why you should not add additional changes when responding to a review. It complicates things, and makes it harder to check that the issues bought up in the review last time were resolved properly).

streambuf set_buf/set_buf_with_owner function

streambuf function set_buf (now called "set_buf_with_owner") is dangerous because it removes the previous buffer even though it may have data in it, without flushing it first. I don't understand the use case for it (i.e. why it's needed) so I don't know whether the best answer is just to have set_buf (... with_owner) flush the buffer, or even just document that you must flush the (current) buffer before calling set_buf; if there is no use case then it should really just be removed (we don't want extra code that isn't used anyway).

i.e. the solution should be one of the following:

  • remove it
  • make it protected instead of public (if it only needs to be called by subclasses)
  • make it flush the old buffer
  • document that the old buffer should be flushed before calling

If the solution is any option other than "remove it", you should be able to explain why.

exceptions function

The exceptions function should behave consistently, it doesn't make sense to have a magic value (0) mean one thing and any other value mean something different. If the function does two different things it should be two different functions.

But: I also really don't see why toggling exception bits is useful, why can't I just set the bits I want set?

Example: if I write a function which takes an istream parameter (by reference) how can I ensure the behaviour (exceptions vs non)? Eg how could I ensure throw_on_buffer_fail is set for example?

void read_some_data(istream &instream) {
    // how can I turn on buffer_failbit in exceptions?

    // If I do:
    instream.exceptions(buffer_failbit);
    // Now I don't know whether buffer failure throws exceptions or not!
}

@mobin-2008
Copy link
Collaborator Author

I will try to fully explain each issue, so hopefully there's no confusion:

Constructors setting buffer_failbit

The constructors should be properly documented (i.e. document that buffer allocation is done when the constructor is called, but may fail, leaving buffer_failbit set). I see now that you did improve the behaviour, but I didn't see that before because you didn't list it as one of the changes and you included many changes so there was a lot of noise.

(That is one reason why you should not add additional changes when responding to a review. It complicates things, and makes it harder to check that the issues bought up in the review last time were resolved properly).

construction examples now include a check to catching buffer allocation failures.

Also I wrote that message to notice there is some other changes with fixing all previous reviews but looks like it made into confusion.

streambuf set_buf/set_buf_with_owner function

streambuf function set_buf (now called "set_buf_with_owner") is dangerous because it removes the previous buffer even
...

i.e. the solution should be one of the following:

  • remove it
    ...

getting raw pointer through get_buf() is enough and I removed them.

exceptions function

The exceptions function should behave consistently, it doesn't make sense to have a magic value (0) mean one thing and any other value mean something different. If the function does two different things it should be two different functions.

But: I also really don't see why toggling exception bits is useful, why can't I just set the bits I want set?

Example: if I write a function which takes an istream parameter (by reference) how can I ensure the behaviour (exceptions vs non)? Eg how could I ensure throw_on_buffer_fail is set for example?

void read_some_data(istream &instream) {
    // how can I turn on buffer_failbit in exceptions?

    // If I do:
    instream.exceptions(buffer_failbit);
    // Now I don't know whether buffer failure throws exceptions or not!
}

exceptions() is renamed to throw_on_states() for better meaning. Also it accepts a boolean to throw or not throw on requested bits instead of toggle behavior:

...
// Caller can be sure about that this call will mark eofbit for throwing `dio::iostream_eof` exception for example
instream.throw_on_states(diostates::eofbit, true);
...

Also I added a new function called clear_throws() which disables all situation can throw exceptions instead of magic 0 parameter.
I think it's enough.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 2 times, most recently from 83a7d11 to 6d78876 Compare December 30, 2023 07:46
@davmac314
Copy link
Owner

I think it's enough.

Ok, but I've seen more pushes come through since. So I will wait until you request review.

@mobin-2008
Copy link
Collaborator Author

So it's allowed? (you didn't answer!)

If it's allowed, what happens to the buffer (does it get flushed or cleared?) and the error states?

If something is allowed, the behaviour should be sensible (and documented).
If it is not allowed, that should be documented ("Must not be called if the stream is already open.")

I think this note is good enough:

    // Note: Opening an already opened stream is not allowed and caller must close the stream
    // before open it again.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 2 times, most recently from 946525a to 4922d35 Compare April 12, 2024 06:47
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you changed some logic in ostream::put when pushing 4922d35

It looks like it fixes a potential problem where partial writes would result in an error (from write/write_nx), so that's ok, but please make an explanatory comment (on the PR) any time you make changes to code in any PR outside what we've already discussed in the review.

This is getting very close to being acceptable but there are still some issues that I've highlighted.

src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more things.

I don't think I will include this in the 1.0 release, there's not enough time to fully test the changes (and it's still not optimal for handling non-blocking I/O). But we can finalise the PR and have it ready to merge with just these fixes.

src/dinit-iostream.cc Show resolved Hide resolved
src/dinit-iostream.cc Outdated Show resolved Hide resolved
return (pre_buffered_chars > 0) ? pre_buffered_chars : -1;
}
msg += res;
pre_buffered_chars += res;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage of pre_buffered_chars doesn't match the comment describing the variable above (line 64) nor does it make sense given the name of the variable.

Use a different variable, or rename it and adjust the comment

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is now called cur_msg_in_buf_count but it still counts bytes that have already been written (flushed) as well as bytes that have been buffered, so the name still doesn't reflect how the variable is used.

Something like output_count is less descriptive but at least has a clear enough meaning. Also the comment on the variable should be clearer, I'll note that separately.

return -1;
}
else while (count > 0) {
ssize_t res = bp_sys::write(fd, msg, count);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is enough space in the buffer to hold the remaining part of the message then it should be buffered not written immediately

Copy link
Owner

@davmac314 davmac314 Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it always copies all of the message via the buffer before writing it.

The point was to not try to write(...) the remaining portion of the message if it could fit in the buffer. If it can't fit in the buffer, writing it directly is the right thing to do. (I.e. the whole point of buffering is to call write as few times as possible. If you can buffer the whole remaining portion, that's 0 calls to write which is optimal. If you can't fit the message in the buffer at all, you can write the whole thing with a single call to write).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the right fix: 20c831d ?

Copy link
Owner

@davmac314 davmac314 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better, but still not really optimal in at least one case (put large amount of data when buffer is empty). As I said, the whole point is to call write as few times as possible (technically that isn't perfectly optimal in all cases, but it's a good-enough heuristic).

So: think about a range of possible scenarios and whether the implementation has the fewest possible calls to write. (Edit) And, if it has the fewest possible calls to write, does it also have the least possible copying of data into the buffer?

src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/includes/dinit-iostream.h Outdated Show resolved Hide resolved
src/tests/test-bpsys.cc Show resolved Hide resolved
// Internal function to load from file descriptor (a wrapper for buf->fill())
int load_into_buf(unsigned len) noexcept;

// Throw exception based on current state
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention what the states parameter is for

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't addressed?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did change the comment, it still doesn't explain what the parameter is for.

// Internal function to appending into the buffer.
ssize_t put(const char *msg, size_t count) noexcept;

// Throw exception based on current state
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention what the states parameter is for

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't addressed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right fix: 135c5fa ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, part of the point of documentation on a function is to explain to someone who is writing code which calls the function what they can/should pass as arguments.

The comment "states parameter is for throwing appropriate exceptions" doesn't help. It doesn't explain what the value should be.

It's a bitmask right? So mention that fact, and say where the possible bit values come from (the io_states enum? and are all of them meaningful in this case or not?). Give enough information so that nobody has to look at the code of the function to figure out how they should call it.

@mobin-2008 mobin-2008 force-pushed the iostream_replace branch 2 times, most recently from 4f1e65a to 78ef8af Compare May 16, 2024 19:46
@mobin-2008
Copy link
Collaborator Author

CI failure is unrelated.

@mobin-2008 mobin-2008 requested a review from davmac314 May 16, 2024 19:58
This is replacement for C++ standard library iostream/fstream.

See dinit-iostream.h for more information and the usage.

Signed-off-by: Mobin "Hojjat" Aydinfar <[email protected]>
Required for testing dinit-iostream.

Signed-off-by: Mobin "Hojjat" Aydinfar <[email protected]>
Signed-off-by: Mobin "Hojjat" Aydinfar <[email protected]>
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you didn't properly understand some of my last review comments. I hope it is clearer now.

Please don't rebase, just add new commits to the PR so I can actually see the changes that you've made. You can rebase to clean up once it's approved. (As I mentioned already though, I don't think we will merge before 1.0 release).

src/dinit-iostream.cc Outdated Show resolved Hide resolved
return -1;
}
else while (count > 0) {
ssize_t res = bp_sys::write(fd, msg, count);
Copy link
Owner

@davmac314 davmac314 Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it always copies all of the message via the buffer before writing it.

The point was to not try to write(...) the remaining portion of the message if it could fit in the buffer. If it can't fit in the buffer, writing it directly is the right thing to do. (I.e. the whole point of buffering is to call write as few times as possible. If you can buffer the whole remaining portion, that's 0 calls to write which is optimal. If you can't fit the message in the buffer at all, you can write the whole thing with a single call to write).

src/tests/test-bpsys.cc Outdated Show resolved Hide resolved
// Internal function to appending into the buffer.
ssize_t put(const char *msg, size_t count) noexcept;

// Throw exception based on current state
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't addressed?

// Internal function to load from file descriptor (a wrapper for buf->fill())
int load_into_buf(unsigned len) noexcept;

// Throw exception based on current state
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't addressed?

@davmac314
Copy link
Owner

Also if anything is unclear, please ask for clarification.

@mobin-2008
Copy link
Collaborator Author

I'm going to work on this again.

According to CODE-STYLE:

>   2.2. Omit "else" if the body of the associated "if" does not fall through (i.e. if it always
>        either returns or throws an exception).

Signed-off-by: Mobin Aydinfar <[email protected]>
…e in ostream::put()

With this change, The best case write count will be 1 instead of 2.

Signed-off-by: Mobin Aydinfar <[email protected]>
@davmac314
Copy link
Owner

I'm going to work on this again.

That's fine, but please take extra care to do things properly rather than relying on my review. The amount of reviewing I have done on this isn't sustainable going forward, I have too much to do and other PRs rolling in.

To be clear, this won't be merged for the 1.0 release, as it's too big a change to incorporate at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Importance: High Enhancement/New Feature Improving things or introduce new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace use of iostream-based classes
2 participants