Discussion:
[asio-users] [http] Formal review of Boost.Http
Bjorn Reese
2015-08-07 07:08:20 UTC
Permalink
Dear Boost and ASIO communities.

The formal review of Vinícius dos Santos Oliveira's Boost.Http library
starts today, August 7th and ends on Sunday August 16th.

Boost.Http is designed for various forms of modern HTTP interaction,
from normal HTTP request, over HTTP chunking and pipelining, to
upgrading to other web protocols like WebSocket. This library builds
on top of Boost.ASIO, and follows the threading model of ASIO.

The two basic building-blocks are http::socket, which is socket that
talks HTTP, and http::message with contains HTTP meta-data and body
information. You can use these building-blocks to build a HTTP server
that fits your exact needs; for instance, an embedded HTTP server for
a ReST API. Boost.Http comes with a light-weight HTTP server and a
static file server.

Currently, Boost.Http is limited to server-side interaction, but the
design principles used extends to client-side as well.

Boost.Http was originally developed as part of GSoC 2014 and Vinícius
has continued to develop and improve the library since then.

The documentation can be found here:

http://boostgsoc14.github.io/boost.http/

The source code can be found here:

https://github.com/BoostGSoC14/boost.http

The source code is build using CMake, but Boost.Build is in the pipeline
(already done for documentation.)

Please answer the following questions:

1. Should Boost.Http be accepted into Boost? Please state all
conditions for acceptance explicity.

2. What is your evaluation of the design?

3. What is your evaluation of the implementation?

4. What is your evaluation of the documentation?

5. What is your evaluation of the potential usefulness of the library?

6. Did you try to use the library? With what compiler? Did you have
any problems?

7. How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

8. Are you knowledgeable about the problem domain?
Niall Douglas
2015-08-07 15:53:15 UTC
Permalink
On 7 Aug 2015 at 9:08, Bjorn Reese wrote:

> The source code is build using CMake, but Boost.Build is in the pipeline
> (already done for documentation.)

I'll give my frank opinion about Http, and note I had expressed this
privately by email some months ago but my opinion then was not heeded
before presenting this library for review. This has unfortunately
forced me to make these opinions public.

I do not believe Http is ready for peer review for these reasons, and
I personally would urge you withdraw it until you have fixed these
issues and then come back to us. If you do not withdraw it, I will
vote for rejection.


1. You require a C++ 11 compiler and are very much an extension of
ASIO throughout, and then don't take advantage of ASIO's extensive
internal C++ 11 vs Boost switching infrastructure. On the second page
of your documentation you state:

"std::error_code, for instance, cannot be transparently replaced by
boost::system::error_code within ASIO-using code"

This is utterly untrue, and I told you that several months ago. ASIO
itself maps some error_code implementation into namespace asio - go
look at its config.hpp. Therefore you simply use whatever error_code
ASIO itself uses, using the same internal STL abstraction
infrastructure ASIO uses. The same goes for every other instance
where you are using Boost dependencies.

I said several months ago that you should properly finish the ASIO
integration by ripping out all your current hard coded usage of Boost
and using ASIO's own STL abstraction layer, and thereby making Http
identical in supported platforms and configurations as ASIO (i.e.
standalone C++ 11 or with-Boost) reusing the same ASIO configuration
macros. I then suggested it would be a very short step to port Http
to C++ 03 as you are not using anything which particularly demands
C++ 11 outside the examples and tests and which ASIO hasn't
abstracted for you.

Http should be a model EXTENSION of ASIO in the cleanest possible
sense in my opinion. You're not far from achieving it, and I really
think you need to finish it.


2. You claim Boost.Build doesn't work outside the Boost tree. This is
untrue - Boost.Build doesn't need Boost. I don't mind cmake for the
unit testing alone as a temporary stopgap until you bite the
Boost.Build bullet before it enters Boost, but I draw the line at
requiring cmake for normal builds. Before presenting for review
normal library builds cannot use cmake OR should be header only like
Hana.


3. ... which leads me to asking why is Http not a header only library
like ASIO? That would eliminate your Boost.Build requirement in the
strictest sense. Standalone ASIO doesn't need Boost.Build. Neither
should Http.

You mention in the FAQ the reason why not header only is mainly due
to the use of a third party HTTP C parser. It could be insufficient
familiarity on my part with the vaguaries of parsing HTTP, but it has
never struck me as difficult to parse - you don't even need to
tokenise, and indeed my first instinct would be it is better to NOT
tokenise HTTP as that is both slower and opens a wider attack surface
to a hand written "stupid" parser.

(BTW static custom error code categories work just fine header only
if correctly set up. Indeed Boost.System can be configured to be
header only)


4. Http is the first line of parsing code in a world full of
malicious exploitation - just this week all Android devices were made
powned with a single malformed MMS message causing a buffer overflow.
Anything parsing untrusted input needs to be regularly input fuzz
tested period, and very especially Http. Until Http is running at
least one of the below fuzz tools (american fuzzy lop or LLVM
libfuzzer) at least weekly on a CI it is not safe to allow into
production code.

https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook#a7.SAFETY:S
tronglyconsideranightlyorweeklyinputfuzzautomatedtestifyourlibraryisab
letoacceptuntrustedinput


Until you address a minimum of items 1 and 4, I am very sorry but I
must vote for rejection.

Other notes:

* I don't understand why you cannot issue more than one async read at
a time nor async write at a time. In something like pipelined HTTP
that seems like a very common pattern. I smell a potential design
flaw, and while the docs mention a "queue_socket" I can't see such a
thing in the reference section.

* Your tutorial examples use "using namespace std;" at global level.
That's a showstopper of bad practice, and you need to purge all of
those. Same for "using namespace boost;" at global level.

* Your reference section is one very long thin list. You might want
to use more horizontal space.

* I personally found the design rationale not useful. I specifically
want to know:

1. Why did Http choose ASIO as my base over leading competitor X
and Y?
2. What sacrifices occurred due to that choice? i.e. things I'd
love weren't the case.
3. What alternatives may be better suited for the user examining
Http for suitability and why? i.e. make your documentation useful to
people with a problem to solve, not just a howto manual.

* Benchmarks, especially latency ones which are important for many
uses of HTTP are missing. I'd particularly like to know confidence
intervals on the statistical distribution so I know worst case
outcomes.

* Personally speaking, I'd like if in your tutorial you took ten
self-contained steps in building a simple HTTP fileserver, each with
a finished working program at the end of each section. Like the ASIO
tutorial does. In particular, I'd like to see layers which add on
etags and other of the fancy facilities in stages.

* I would personally worry about investing on developing code based
on Http until you have HTTP 2.0 support in there, only because I
would suspect you might need to break the API a bit to get the
all-binary format of HTTP 2.0 working as it's such a change from HTTP
1.0. It may merely be paranoia, but I suspect I wouldn't be alone on
this silent assumption.

* You're still missing CI testing with valgrind, thread sanitiser,
coveralls.io coverage testing, etc etc all the stuff from
https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook.


I hope all the above isn't too disheartening Vinícius. If you took
six weeks extra to finish the port of Http to ASIO proper, added a
Jamfile for the main library build, and patched in some input fuzz
testing (I don't mind if the tests fail across the board BTW, I just
want to see the HTTP parser fuzz tested regularly) I would have no
objection to Http being reviewed and I believe you would get a
conditional acceptance from me at least. As Http currently stands, I
don't believe it is currently ready for review.

I should also add that I have been following this project intently
since its
inception, and I am looking forward to integrating a Http Filesystem
backend
into AFIO as soon as Http passes Boost review and gains a client side
API.

Which, for the record, I believe it will. Your overall design is very
solid,
you had an excellent mentor during GSoC, all you need is (quite a
lot) more polish. I don't find any fundamental design mistakes in
Http apart from the bad design smell surrounding lack of concurrency
in async read and write, and even that could just be a documentation
problem.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/
http://ie.linkedin.com/in/nialldouglas/
Vinícius dos Santos Oliveira
2015-08-07 22:29:58 UTC
Permalink
2015-08-07 12:53 GMT-03:00 Niall Douglas <***@nedprod.com>:

> I do not believe Http is ready for peer review for these reasons, and
> I personally would urge you withdraw it until you have fixed these
> issues and then come back to us. If you do not withdraw it, I will
> vote for rejection.
>

Can I persuade you to review the Boost.Http design? I'd reject myself if
you find any fundamental flaw on the design and even if you're going to
reject the library because the lack of tooling (not directly related to
design or documentation, which are the most important items on any
library), it'd be useful to know what needs to be addressed on the design
before I submit again (in the case the submission is rejected).

1. You require a C++ 11 compiler and are very much an extension of
> ASIO throughout, and then don't take advantage of ASIO's extensive
> internal C++ 11 vs Boost switching infrastructure.


It's not my intention to make Boost.Http work with C++. I use Boost as
dependency and I'm not willing to change that (unless the pieces I use from
Boost enter on the C++ standard).

About C++03 support, it's not required for Boost acceptance, so I can add
later. I want to increase tests coverage a lot before porting code to
C++03, so I don't introduce accidental bugs in the process.

In the case the abstractions I need enter on the C++ standard, maybe your
APIbind will help me.

On the second page
> of your documentation you state:
>
> "std::error_code, for instance, cannot be transparently replaced by
> boost::system::error_code within ASIO-using code"
>
> This is utterly untrue, and I told you that several months ago. ASIO
> itself maps some error_code implementation into namespace asio - go
> look at its config.hpp. Therefore you simply use whatever error_code
> ASIO itself uses, using the same internal STL abstraction
> infrastructure ASIO uses. The same goes for every other instance
> where you are using Boost dependencies.
>

I tried to find such macros on Asio and I couldn't find any for error_code:
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/overview/cpp2011.html

Now that you mentioned config.hpp, I used my grep/find skills and I found:
BOOST_ASIO_HAS_STD_SYSTEM_ERROR

But I just don't find any reference to BOOST_ASIO_HAS_STD_SYSTEM_ERROR
outside of config.hpp. Even if it was used, it is undocumented, which means
it is an implementation detail that won't be guaranteed to be there forever
(accidental behaviour versus a documented guarantee).

I said several months ago that you should properly finish the ASIO
> integration by ripping out all your current hard coded usage of Boost
> and using ASIO's own STL abstraction layer, and thereby making Http
> identical in supported platforms and configurations as ASIO (i.e.
> standalone C++ 11 or with-Boost) reusing the same ASIO configuration
> macros. I then suggested it would be a very short step to port Http
> to C++ 03 as you are not using anything which particularly demands
> C++ 11 outside the examples and tests and which ASIO hasn't
> abstracted for you.
>
> Http should be a model EXTENSION of ASIO in the cleanest possible
> sense in my opinion. You're not far from achieving it, and I really
> think you need to finish it.
>

Delay acceptance of the library will only delay its usage. Design of core
abstractions is finished and I fail to see why its acceptance should be
delayed while I improve implementation details.

Standalone Boost.Http isn't as exciting as a WebSocket implementation.
There is a roadmap of the features:
https://boostgsoc14.github.io/boost.http/design_choices/roadmap.html

C++03 is on the roadmap, but standalone Boost.Http is not.

2. You claim Boost.Build doesn't work outside the Boost tree. This is
> untrue - Boost.Build doesn't need Boost. I don't mind cmake for the
> unit testing alone as a temporary stopgap until you bite the
> Boost.Build bullet before it enters Boost, but I draw the line at
> requiring cmake for normal builds. Before presenting for review
> normal library builds cannot use cmake OR should be header only like
> Hana.
>

CMake will be replaced by Boost.Build before any integration.

3. ... which leads me to asking why is Http not a header only library
> like ASIO? That would eliminate your Boost.Build requirement in the
> strictest sense. Standalone ASIO doesn't need Boost.Build. Neither
> should Http.
>
> You mention in the FAQ the reason why not header only is mainly due
> to the use of a third party HTTP C parser. It could be insufficient
> familiarity on my part with the vaguaries of parsing HTTP, but it has
> never struck me as difficult to parse - you don't even need to
> tokenise, and indeed my first instinct would be it is better to NOT
> tokenise HTTP as that is both slower and opens a wider attack surface
> to a hand written "stupid" parser.
>
> (BTW static custom error code categories work just fine header only
> if correctly set up. Indeed Boost.System can be configured to be
> header only)
>

This parser is well tested and largely used. I want to increase test
coverage a lot before replacing the parser with one of my own. HTTP spec
used to be very confusing before RFC7230. Just now it's easy to implement
all corner-cases. Previously, you needed a lot of experience.

4. Http is the first line of parsing code in a world full of
> malicious exploitation - just this week all Android devices were made
> powned with a single malformed MMS message causing a buffer overflow.
> Anything parsing untrusted input needs to be regularly input fuzz
> tested period, and very especially Http. Until Http is running at
> least one of the below fuzz tools (american fuzzy lop or LLVM
> libfuzzer) at least weekly on a CI it is not safe to allow into
> production code.
>
> https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook#a7.SAFETY:S
> tronglyconsideranightlyorweeklyinputfuzzautomatedtestifyourlibraryisab
> letoacceptuntrustedinput
>

That's why this first implementation uses a proven-parser that is very well
tested. All tests from the Boost.Http socket will run with varying buffer
sizes (from 1 to 2047). Of course I do enjoy tests and I'm constantly
implementing more (the last experiment was using sanitizers).

Until you address a minimum of items 1 and 4, I am very sorry but I
> must vote for rejection.
>
> Other notes:
>
> * I don't understand why you cannot issue more than one async read at
> a time nor async write at a time. In something like pipelined HTTP
> that seems like a very common pattern. I smell a potential design
> flaw, and while the docs mention a "queue_socket" I can't see such a
> thing in the reference section.
>

Too much synchronization and buffering.

The pipeline has the following requests:

A => B => C

If I were to allow replies to request B, the design would be more complex,
increasing the difficult to implement alternative backends and everyone's
life. Also, the reply to request B cannot actually be sent while the reply
to request A is on hold. This means that the whole response to request B
would need buffering and if response B is a video live stream, this means
an implicit (the user cannot know) crash after an unfortunate memory
exhaustion. I couldn't claim a **lightweight** server if this excessive
buffering was to occur. I couldn't claim the project is appropriate for
embedded devices if I had chosen the buffering approach. It's a trade-off.
I should write about it in the "design choice" of the documentation.

HTTP/2.0 does have proper multiplexing and it can be more useful to respond
several requests without "head of line blocking".

A queue socket is a concept, I don't provide one. I can look into how make
the documentation less confusing by adding a page exclusively dedicated to
explain the problem it solves. Would that work for you?

* Your tutorial examples use "using namespace std;" at global level.
> That's a showstopper of bad practice, and you need to purge all of
> those. Same for "using namespace boost;" at global level.
>

The tutorials should be as small as possible to not scary the users.

But I can put a comment "you should avoid using namespace". Does it help?

* Your reference section is one very long thin list. You might want
> to use more horizontal space.
>

I'd be happy already if I knew hot to remove the TOC from the reference
page. It'd be much more clean already.

* I personally found the design rationale not useful. I specifically
> want to know:
>
> 1. Why did Http choose ASIO as my base over leading competitor X
> and Y?
> 2. What sacrifices occurred due to that choice? i.e. things I'd
> love weren't the case.
>

I assume I don't need to reply these questions during the review, given the
reviewers may already be experienced on the answers. However, these are
indeed good questions that I could add to the "design choices" page. Thank
you.

3. What alternatives may be better suited for the user examining
> Http for suitability and why? i.e. make your documentation useful to
> people with a problem to solve, not just a howto manual.
>

I don't quite understand this point.

* Benchmarks, especially latency ones which are important for many
> uses of HTTP are missing. I'd particularly like to know confidence
> intervals on the statistical distribution so I know worst case
> outcomes.
>

These values will change when I replace the parser used now. Also, they can
change based on the buffer sizes and parser options, which the user is/will
be able to change.

Nevertheless, I do want to do some serious benchmarking, specially because
it'll give me a reference point when I write my own parser and want to
compare performance speedup. I can do some poor benchmark this weekend and
show the results.

* Personally speaking, I'd like if in your tutorial you took ten
> self-contained steps in building a simple HTTP fileserver, each with
> a finished working program at the end of each section. Like the ASIO
> tutorial does. In particular, I'd like to see layers which add on
> etags and other of the fancy facilities in stages.
>

That's a good idea, thank you.

I won't replace the current tutorial, though, as it teaches the main
players on HTTP. I can, however, add a second tutorial showing how to
implement a **simple** file server.

* I would personally worry about investing on developing code based
> on Http until you have HTTP 2.0 support in there, only because I
> would suspect you might need to break the API a bit to get the
> all-binary format of HTTP 2.0 working as it's such a change from HTTP
> 1.0. It may merely be paranoia, but I suspect I wouldn't be alone on
> this silent assumption.
>

You're right about being paranoid here. I was too. But I've spent some time
researching HTTP/2.0 and I got confident about the current design. But you
could vote for conditional acceptance based on HTTP/2.0 and I'd think it is
fair.

* You're still missing CI testing with valgrind, thread sanitiser,
> coveralls.io coverage testing, etc etc all the stuff from
> https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook.
>

I already started the experiments with the sanitizers, but I got a problem
with Boost Test and the experiment was put on hold. I'll come back to it
later.

I hope all the above isn't too disheartening Vinícius.


Not at all. I think you're the second person to dedicate more time
reviewing my work and this isn't disheartening at all. Just that your
feedback is usually around tooling and not the design itself (everybody has
its preferences).

If you took
> six weeks extra to finish the port of Http to ASIO proper,


I disagree about this point. Boost.Http is properly using Asio. Just
because it won't support standalone Asio, it doesn't mean it's improperly
using Asio.

added a
> Jamfile for the main library build,


Consider this done. I won't integrate into Boost without removing CMake
first. This was already partially stated in the Bjorn's announcement (but
somehow implicit, to be fair).


--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Lee Clagett
2015-08-08 03:10:34 UTC
Permalink
On Fri, Aug 7, 2015 at 6:29 PM, Vinícius dos Santos Oliveira <
***@users.sf.net> wrote:

> 2015-08-07 12:53 GMT-03:00 Niall Douglas <***@nedprod.com>:
>
>> I do not believe Http is ready for peer review for these reasons, and
>> I personally would urge you withdraw it until you have fixed these
>> issues and then come back to us. If you do not withdraw it, I will
>> vote for rejection.
>>
> ...[large snip]...
>

>> * I don't understand why you cannot issue more than one async read at
>> a time nor async write at a time. In something like pipelined HTTP
>> that seems like a very common pattern. I smell a potential design
>> flaw, and while the docs mention a "queue_socket" I can't see such a
>> thing in the reference section.
>>
>
> Too much synchronization and buffering.
>
> The pipeline has the following requests:
>
> A => B => C
>
> If I were to allow replies to request B, the design would be more complex,
> increasing the difficult to implement alternative backends and everyone's
> life. Also, the reply to request B cannot actually be sent while the reply
> to request A is on hold. This means that the whole response to request B
> would need buffering and if response B is a video live stream, this means
> an implicit (the user cannot know) crash after an unfortunate memory
> exhaustion. I couldn't claim a **lightweight** server if this excessive
> buffering was to occur. I couldn't claim the project is appropriate for
> embedded devices if I had chosen the buffering approach. It's a trade-off.
> I should write about it in the "design choice" of the documentation.
>
> HTTP/2.0 does have proper multiplexing and it can be more useful to
> respond several requests without "head of line blocking".
>
> A queue socket is a concept, I don't provide one. I can look into how make
> the documentation less confusing by adding a page exclusively dedicated to
> explain the problem it solves. Would that work for you?
>
>
FWIW it should be possible to do client and server side pipelining with
this design right now. The reads and writes of the HTTP Socket concept
should be (and currently are by `basic_socket`) handled independently.
Since the notification for a write completion is based on the lower layer
write completion, and NOT whether a HTTP response was received, multiple
(pipelined) requests should be possible by a client. Similarly, this design
should allow servers to handle pipelined requests in parrallel. Providing a
framework for client or server pipelining will require more work, and
should probably not be a part of the Socket concept. In fact -

I think the ServerSocket concept should be removed entirely. The
`http::basic_socket<...>` models both concepts, so theres lots of server
specific code in it, which seems confusing to me (is this really a "basic"
socket?). I think standalone functions for typical client and server
operations can be implemented as "composed" operations similar to
`asio::async_write`. The documentation will have to be explicit on what the
composed function is doing so that users know whether the Socket concept
has outstanding reads or writes (and therefore cannot do certain operations
until the handler is invoked). In fact, if there is a server operation that
_cannot_ be done with the Socket concept, then the concept probably needs
to be re-designed.

Lee
Lee Clagett
2015-08-08 03:49:01 UTC
Permalink
Sorry for any duplicates and top posting, but I didn't reply-all, so boost
got dropped (which was probably more important). And this likely ruins some
mail clients (again sorry!).

On Fri, Aug 7, 2015 at 11:10 PM, Lee Clagett <***@leeclagett.com> wrote:

> On Fri, Aug 7, 2015 at 6:29 PM, Vinícius dos Santos Oliveira <
> ***@users.sf.net> wrote:
>
>> 2015-08-07 12:53 GMT-03:00 Niall Douglas <***@nedprod.com>:
>>
>>> I do not believe Http is ready for peer review for these reasons, and
>>> I personally would urge you withdraw it until you have fixed these
>>> issues and then come back to us. If you do not withdraw it, I will
>>> vote for rejection.
>>>
>> ...[large snip]...
>>
>
>>> * I don't understand why you cannot issue more than one async read at
>>> a time nor async write at a time. In something like pipelined HTTP
>>> that seems like a very common pattern. I smell a potential design
>>> flaw, and while the docs mention a "queue_socket" I can't see such a
>>> thing in the reference section.
>>>
>>
>> Too much synchronization and buffering.
>>
>> The pipeline has the following requests:
>>
>> A => B => C
>>
>> If I were to allow replies to request B, the design would be more
>> complex, increasing the difficult to implement alternative backends and
>> everyone's life. Also, the reply to request B cannot actually be sent while
>> the reply to request A is on hold. This means that the whole response to
>> request B would need buffering and if response B is a video live stream,
>> this means an implicit (the user cannot know) crash after an unfortunate
>> memory exhaustion. I couldn't claim a **lightweight** server if this
>> excessive buffering was to occur. I couldn't claim the project is
>> appropriate for embedded devices if I had chosen the buffering approach.
>> It's a trade-off. I should write about it in the "design choice" of the
>> documentation.
>>
>> HTTP/2.0 does have proper multiplexing and it can be more useful to
>> respond several requests without "head of line blocking".
>>
>> A queue socket is a concept, I don't provide one. I can look into how
>> make the documentation less confusing by adding a page exclusively
>> dedicated to explain the problem it solves. Would that work for you?
>>
>>
> FWIW it should be possible to do client and server side pipelining with
> this design right now. The reads and writes of the HTTP Socket concept
> should be (and currently are by `basic_socket`) handled independently.
> Since the notification for a write completion is based on the lower layer
> write completion, and NOT whether a HTTP response was received, multiple
> (pipelined) requests should be possible by a client. Similarly, this design
> should allow servers to handle pipelined requests in parrallel. Providing a
> framework for client or server pipelining will require more work, and
> should probably not be a part of the Socket concept. In fact -
>
> I think the ServerSocket concept should be removed entirely. The
> `http::basic_socket<...>` models both concepts, so theres lots of server
> specific code in it, which seems confusing to me (is this really a "basic"
> socket?). I think standalone functions for typical client and server
> operations can be implemented as "composed" operations similar to
> `asio::async_write`. The documentation will have to be explicit on what the
> composed function is doing so that users know whether the Socket concept
> has outstanding reads or writes (and therefore cannot do certain operations
> until the handler is invoked). In fact, if there is a server operation that
> _cannot_ be done with the Socket concept, then the concept probably needs
> to be re-designed.
>
> Lee
>
Vinícius dos Santos Oliveira
2015-08-08 09:16:37 UTC
Permalink
2015-08-08 0:10 GMT-03:00 Lee Clagett <***@leeclagett.com>:

> FWIW it should be possible to do client and server side pipelining with
> this design right now. The reads and writes of the HTTP Socket concept
> should be (and currently are by `basic_socket`) handled independently.
> Since the notification for a write completion is based on the lower layer
> write completion, and NOT whether a HTTP response was received, multiple
> (pipelined) requests should be possible by a client. Similarly, this design
> should allow servers to handle pipelined requests in parrallel. Providing a
> framework for client or server pipelining will require more work, and
> should probably not be a part of the Socket concept. In fact -
>

The current design **does** support pipelining. Some HTTP clients disable
pipelining because some buggy servers will get confuse. This won't happen
with Boost.Http (there are tests to ensure pipelning is working).

What the current design does **not** support is handling pipelined requests
concurrently. When you starting read a request, the write_state[1] will
change to finished until the fully request is received, so you can't get
multiple request while you don't issue the replies. This behaviour is
explained in the ServerSocket concept[2].

There are just too many ways to allow HTTP concurrent pipeline that would
be transparent for the user, but all of them are heavier. If I was to allow
handling multiple concurrent HTTP pipelined requests, I'd expose user
cooperation, so the abstraction doesn't become so heavy. The user would
need to be aware if the reply for some request can already be delivered or
not. Of course this means the socket will need to remember order and attach
some id to the request messages. It can be problematic in alternative
backends because the id could be of a different type and makes it difficult
to use the same message type with different HTTP backends. What do you
think? The design would just be more complex for not much gain. HTTP/2.0
allow real multiplexing and doesn't show this problem.

I think the ServerSocket concept should be removed entirely. The
> `http::basic_socket<...>` models both concepts, so theres lots of server
> specific code in it, which seems confusing to me (is this really a "basic"
> socket?).
>

basic_socket uses basic_ prefix just like basic_string.

I think standalone functions for typical client and server operations can
> be implemented as "composed" operations similar to `asio::async_write`. The
> documentation will have to be explicit on what the composed function is
> doing so that users know whether the Socket concept has outstanding reads
> or writes (and therefore cannot do certain operations until the handler is
> invoked). In fact, if there is a server operation that _cannot_ be done
> with the Socket concept, then the concept probably needs to be re-designed.
>

About "users know whether..." seems like a bad idea if I want to deliver
multiple backends. Some implementation details should just be abstracted
away.

I don't understand what you mean by "if there is a server operation that
cannot be done with the Socket concept [...]". Boost.Http has two concepts,
ServerSocket and Socket. Socket concept will be useful to client-side HTTP
and that's the reason why there are two concepts.

[1] https://boostgsoc14.github.io/boost.http/reference/write_state.html
[2] "The ServerSocket object MUST prevent the user from issuing new replies
while [...]",
https://boostgsoc14.github.io/boost.http/reference/server_socket_concept.html


--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Lee Clagett
2015-08-08 19:36:20 UTC
Permalink
On Sat, Aug 8, 2015 at 5:16 AM, Vinícius dos Santos Oliveira <
***@users.sf.net> wrote:

> 2015-08-08 0:10 GMT-03:00 Lee Clagett <***@leeclagett.com>:
>
>> FWIW it should be possible to do client and server side pipelining with
>> this design right now. The reads and writes of the HTTP Socket concept
>> should be (and currently are by `basic_socket`) handled independently.
>> Since the notification for a write completion is based on the lower layer
>> write completion, and NOT whether a HTTP response was received, multiple
>> (pipelined) requests should be possible by a client. Similarly, this design
>> should allow servers to handle pipelined requests in parrallel. Providing a
>> framework for client or server pipelining will require more work, and
>> should probably not be a part of the Socket concept. In fact -
>>
>
> The current design **does** support pipelining. Some HTTP clients disable
> pipelining because some buggy servers will get confuse. This won't happen
> with Boost.Http (there are tests to ensure pipelning is working).
>
> What the current design does **not** support is handling pipelined
> requests concurrently. When you starting read a request, the write_state[1]
> will change to finished until the fully request is received, so you can't
> get multiple request while you don't issue the replies. This behaviour is
> explained in the ServerSocket concept[2].
>
> There are just too many ways to allow HTTP concurrent pipeline that would
> be transparent for the user, but all of them are heavier. If I was to allow
> handling multiple concurrent HTTP pipelined requests, I'd expose user
> cooperation, so the abstraction doesn't become so heavy. The user would
> need to be aware if the reply for some request can already be delivered or
> not. Of course this means the socket will need to remember order and attach
> some id to the request messages. It can be problematic in alternative
> backends because the id could be of a different type and makes it difficult
> to use the same message type with different HTTP backends. What do you
> think? The design would just be more complex for not much gain. HTTP/2.0
> allow real multiplexing and doesn't show this problem.
>

I agree that providing any pipelining framework in the library is probably
unnecessary right now. However, the design should not inhibit such
functionality from being added (which I do not think is a problem with this
design).



> I think the ServerSocket concept should be removed entirely. The
>> `http::basic_socket<...>` models both concepts, so theres lots of server
>> specific code in it, which seems confusing to me (is this really a "basic"
>> socket?).
>>
>
> basic_socket uses basic_ prefix just like basic_string.
>

I was asking whether basic_socket should NOT model the http::ServerSocket
concept. I understood the naming convention, and the decision to provide
the implementation as a configurable template. The problem is the lack of
composability. If a different http::Socket concept is desired (someone not
using ASIO, etc.), then the http::ServerSocket concept has to be
re-implemented also since there is no other implementation. The obvious way
is to achieve composability is to use inheritance with:
`http::basic_server_socket<Socket>` where `Socket` is a http::Socket
concept. Inheritance in this situation has its drawbacks, for sure (a
virtual destructor should be considered).



> I think standalone functions for typical client and server operations can
>> be implemented as "composed" operations similar to `asio::async_write`. The
>> documentation will have to be explicit on what the composed function is
>> doing so that users know whether the Socket concept has outstanding reads
>> or writes (and therefore cannot do certain operations until the handler is
>> invoked). In fact, if there is a server operation that _cannot_ be done
>> with the Socket concept, then the concept probably needs to be re-designed.
>>
>
> About "users know whether..." seems like a bad idea if I want to deliver
> multiple backends. Some implementation details should just be abstracted
> away.
>
> http::SocketServer implementations manipulate the state of the
http::Socket (and indirectly the asio::tcp::socket), making some actions
unavailable after invoking those functions. For example, after invoking
`async_write_response`, no writes on the http::Socket/http::ServerSocket or
asio::ip::tcp::socket can occur until the callback is invoked because it
calls asio::write on the asio::ip::tcp::socket directly. So unless I am
incorrect, it is important for the users to know whats being manipulated
(the effects).


I don't understand what you mean by "if there is a server operation that
> cannot be done with the Socket concept [...]". Boost.Http has two concepts,
> ServerSocket and Socket. Socket concept will be useful to client-side HTTP
> and that's the reason why there are two concepts.
>
>
The http::ServerSocket concept requires a http::Socket to write its
messages, but does the http::ServerSocket concept need to be a refinement
of the http::Socket concept? This implies a strong relationship. The
current specification looks like a http::ServerTraits concept - its
specifying how a http::Socket is being used in situations specific to
servers. There doesn't appear to be any additional state tracking needed
for the http::ServerSocket functions, which is why I suggested standalone
functions (I didn't see the FileServer section). In fact implementation can
be inverted; make all of the current implementations for http::ServerSocket
standalone functions, and then have the default http::ServerTraits
implementation call the standalone versions of the functions. The
http::ServerTraits should then take a http::Socket as an argument for each
of the functions. This complete de-coupling would allow someone to
independently change the behavior of any related server functions, even
while using the same http::Socket object.


[1] https://boostgsoc14.github.io/boost.http/reference/write_state.html
> [2] "The ServerSocket object MUST prevent the user from issuing new
> replies while [...]",
> https://boostgsoc14.github.io/boost.http/reference/server_socket_concept.html
>
>

Lee
Vinícius dos Santos Oliveira
2015-08-09 10:25:59 UTC
Permalink
2015-08-08 16:36 GMT-03:00 Lee Clagett <***@leeclagett.com>:

> [...] The problem is the lack of composability. If a different
> http::Socket concept is desired (someone not using ASIO, etc.), then the
> http::ServerSocket concept has to be re-implemented also since there is no
> other implementation.
>

Exactly.

The obvious way is to achieve composability is to use inheritance with:
> `http::basic_server_socket<Socket>` where `Socket` is a http::Socket
> concept. Inheritance in this situation has its drawbacks, for sure (a
> virtual destructor should be considered).
>

I don't know. The only place that I see that can be abstracted among all
alternative http backends is management of read_state and write_state and
some auxiliary functions to check user intent (close connection...). The
auxiliary functions can be provided without a base class and the read_state
and write_state code is too little to convince me it's worth.

http::SocketServer implementations manipulate the state of the http::Socket
> (and indirectly the asio::tcp::socket), making some actions unavailable
> after invoking those functions. For example, after invoking
> `async_write_response`, no writes on the http::Socket/http::ServerSocket or
> asio::ip::tcp::socket can occur until the callback is invoked because it
> calls asio::write on the asio::ip::tcp::socket directly. So unless I am
> incorrect, it is important for the users to know whats being manipulated
> (the effects).
>

The documentation points the use of composed operations:
https://boostgsoc14.github.io/boost.http/reference/basic_socket.html

It's akin to Asio documentation on composed operations:
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/reference/async_write/overload1.html
(i.e. "the stream performs no other [...] until this operation completes")

But now that you mentioned, I see that documentation can receive some
improvements on the topic. The ServerSocket concept doesn't mention that
some models can make use of this restriction nor there is any trait to
detect if a model makes use of composed operations.

The http::ServerSocket concept requires a http::Socket to write its
> messages, but does the http::ServerSocket concept need to be a refinement
> of the http::Socket concept? This implies a strong relationship. The
> current specification looks like a http::ServerTraits concept - its
> specifying how a http::Socket is being used in situations specific to
> servers. There doesn't appear to be any additional state tracking needed
> for the http::ServerSocket functions, which is why I suggested standalone
> functions (I didn't see the FileServer section). In fact implementation can
> be inverted; make all of the current implementations for http::ServerSocket
> standalone functions, and then have the default http::ServerTraits
> implementation call the standalone versions of the functions. The
> http::ServerTraits should then take a http::Socket as an argument for each
> of the functions. This complete de-coupling would allow someone to
> independently change the behavior of any related server functions, even
> while using the same http::Socket object.
>

If I understood you correctly, you're proposing that all functions should
be free functions, not member-functions, then everybody can customize any
type to model a ServerSocket. And then you criticize the strong
relationship between Socket and ServerSocket.

Asio doesn't follow this model (async_read_some is a member-function, not a
free function). It does appear to be an interesting idea, but I'm not sure
I'm prepared to solve this detail of customization level. The order of
includes could completely change the behaviour and this is the least of the
problems. Can you elaborate more, then we can discuss?

And the reason behind Socket and ServerSocket is the future addition of a
ClientSocket, which will be a refinement of Socket.


--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Lee Clagett
2015-08-09 20:46:57 UTC
Permalink
On Sun, Aug 9, 2015 at 6:24 AM, Vinícius dos Santos Oliveira <
***@gmail.com> wrote:

> 2015-08-08 16:36 GMT-03:00 Lee Clagett <***@leeclagett.com>:
>
> > [...] The problem is the lack of composability. If a different
> > http::Socket concept is desired (someone not using ASIO, etc.), then the
> > http::ServerSocket concept has to be re-implemented also since there is
> no
> > other implementation.
> >
>
> Exactly.
>
> The obvious way is to achieve composability is to use inheritance with:
> > `http::basic_server_socket<Socket>` where `Socket` is a http::Socket
> > concept. Inheritance in this situation has its drawbacks, for sure (a
> > virtual destructor should be considered).
> >
>
> I don't know. The only place that I see that can be abstracted among all
> alternative http backends is management of read_state and write_state and
> some auxiliary functions to check user intent (close connection...). The
> auxiliary functions can be provided without a base class and the read_state
> and write_state code is too little to convince me it's worth.
>
> http::SocketServer implementations manipulate the state of the http::Socket
> > (and indirectly the asio::tcp::socket), making some actions unavailable
> > after invoking those functions. For example, after invoking
> > `async_write_response`, no writes on the http::Socket/http::ServerSocket
> or
> > asio::ip::tcp::socket can occur until the callback is invoked because it
> > calls asio::write on the asio::ip::tcp::socket directly. So unless I am
> > incorrect, it is important for the users to know whats being manipulated
> > (the effects).
> >
>
> The documentation points the use of composed operations:
> https://boostgsoc14.github.io/boost.http/reference/basic_socket.html
>
> It's akin to Asio documentation on composed operations:
>
> http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/reference/async_write/overload1.html
> (i.e. "the stream performs no other [...] until this operation completes")
>
> But now that you mentioned, I see that documentation can receive some
> improvements on the topic. The ServerSocket concept doesn't mention that
> some models can make use of this restriction nor there is any trait to
> detect if a model makes use of composed operations.
>
> The http::ServerSocket concept requires a http::Socket to write its
> > messages, but does the http::ServerSocket concept need to be a refinement
> > of the http::Socket concept? This implies a strong relationship. The
> > current specification looks like a http::ServerTraits concept - its
> > specifying how a http::Socket is being used in situations specific to
> > servers. There doesn't appear to be any additional state tracking needed
> > for the http::ServerSocket functions, which is why I suggested standalone
> > functions (I didn't see the FileServer section). In fact implementation
> can
> > be inverted; make all of the current implementations for
> http::ServerSocket
> > standalone functions, and then have the default http::ServerTraits
> > implementation call the standalone versions of the functions. The
> > http::ServerTraits should then take a http::Socket as an argument for
> each
> > of the functions. This complete de-coupling would allow someone to
> > independently change the behavior of any related server functions, even
> > while using the same http::Socket object.
> >
>
> If I understood you correctly, you're proposing that all functions should
> be free functions, not member-functions, then everybody can customize any
> type to model a ServerSocket. And then you criticize the strong
> relationship between Socket and ServerSocket.
>
> Asio doesn't follow this model (async_read_some is a member-function, not a
> free function). It does appear to be an interesting idea, but I'm not sure
> I'm prepared to solve this detail of customization level. The order of
> includes could completely change the behaviour and this is the least of the
> problems. Can you elaborate more, then we can discuss?
>
> And the reason behind Socket and ServerSocket is the future addition of a
> ClientSocket, which will be a refinement of Socket.
>
>
The problem is that I incorrectly jumped to conclusions after seeing the
http::Socket concept, and assumed it had all the functionality to parse
HTTP (server and client), because thats exactly how I would try do it for
flexibility purposes. This http::Socket concept explicitly does not provide
a mechanism to read the first line & headers of a HTTP message. I expected
the concept to handle the fields portion of the header at least, since they
are identical. This is why I thought the http::ServerSocket were more like
traits - I thought they were defining some convenience functions into the
more basic calls of the http::Socket concept.

Anyway-
Can http::Socket concept define a function for reading the headers (they
should be identical)? OR can the first line of a http message be read into
the empty "" string of the headers map? OR could the http::Socket require a
separate string for the first line?

If either of the last two were chosen - standalone functions could
"compose" against a http::Socket::async_read_headers function, where upon
completion they would move the first line to some place else or parse them
apart them into separate fields provided by the user. The major funky thing
is handling HEAD command responses, which I assume was going to have a
specific function in http::ClientSocket ? This might need a special state
manipulation regardless.


I'll probably have to start using it before saying much more I guess ...

Lee
Vinícius dos Santos Oliveira
2015-08-09 21:40:17 UTC
Permalink
2015-08-09 17:46 GMT-03:00 Lee Clagett <***@leeclagett.com>:

> Can http::Socket concept define a function for reading the headers (they
> should be identical)?
>

It can, but it'd make the design more complex. I want to keep the design
simple, so users won't get confused. If

OR can the first line of a http message be read into
> the empty "" string of the headers map?
>

It's a clever idea. Too clever, indeed. And you end up losing all your type
safety. The level of cleverness that I don't appreciate.

OR could the http::Socket require a
> separate string for the first line?
>

It has the same problem of the lack of type safety. A boost::variant could
be used to partially solve the problem, but the wrong type still could be
passed around.

If either of the last two were chosen - standalone functions could
> "compose" against a http::Socket::async_read_headers function, where upon
> completion they would move the first line to some place else or parse them
> apart them into separate fields provided by the user. The major funky thing
> is handling HEAD command responses, which I assume was going to have a
> specific function in http::ClientSocket ? This might need a special state
> manipulation regardless.
>

The major problem I see is the added complexity. There would be more values
for read_state and write_state (like first_line_read, headers_read). I
don't see a major benefit. The separate states for the body has a clear use
case (not exhaust memory and allow live video stream and stuff).

I'd like to read/hear opinion on other members of the list if they
appreciate the added complexity and think this is the right decision. I can
provide any, maybe we can discuss.


--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Niall Douglas
2015-08-08 14:00:46 UTC
Permalink
On 7 Aug 2015 at 19:29, Vinícius dos Santos Oliveira wrote:

> > I do not believe Http is ready for peer review for these reasons, and
> > I personally would urge you withdraw it until you have fixed these
> > issues and then come back to us. If you do not withdraw it, I will
> > vote for rejection.
>
> Can I persuade you to review the Boost.Http design? I'd reject myself if
> you find any fundamental flaw on the design and even if you're going to
> reject the library because the lack of tooling (not directly related to
> design or documentation, which are the most important items on any
> library), it'd be useful to know what needs to be addressed on the design
> before I submit again (in the case the submission is rejected).

I think my earlier email was poorly phrased now I've slept on it. I
made my objections seem like they were about portability, when they
were really about design and intent of Http.

What I should have said was this: Standalone ASIO is the reference
Networking TS implementation for standardisation. Http is currently a
set of extensions to Boost.ASIO specifically when it *should* be a
set of extensions to the reference Networking TS implementation for
C++ standardisation. This is why your current design and architecture
is flawed, but luckily there isn't too much work to fix this by
finishing the port of Http to the Networking TS.

The Internet of Things tends to involve lots of small somewhat
underpowered internet devices where C++ is likely going to play a
much more important role than historically in embedded systems. The
chances are that any Networking TS in C++ will be very extensively
used in such IoT applications. If there were also a high quality set
of extensions adding HTTP support to the Networking TS, I think your
chances are good that your Http library would be seriously considered
by WG21 for standardisation.

*This* is why I very strongly believe that Http needs to cleave
itself tightly to the Networking TS reference implementation, and not
to Boost.ASIO. Moreover, if you make Http an extension of the
Networking TS, you get Boost support with that for free - Http
becomes dual use.

> About C++03 support, it's not required for Boost acceptance, so I can add
> later. I want to increase tests coverage a lot before porting code to
> C++03, so I don't introduce accidental bugs in the process.

As a strong advocate for C++ 11 over 03 I can understand. However,
IoT developers tend to be stuck on some pretty ancient toolsets for
longer than others. 03 support I think would greatly increase your
potential userbase.

> In the case the abstractions I need enter on the C++ standard, maybe your
> APIbind will help me.

I actually would advise you to choose ASIO's STL abstraction
machinery over my own alternative APIBind in this instance.

> But I just don't find any reference to BOOST_ASIO_HAS_STD_SYSTEM_ERROR
> outside of config.hpp. Even if it was used, it is undocumented, which means
> it is an implementation detail that won't be guaranteed to be there forever
> (accidental behaviour versus a documented guarantee).

I'm pretty sure that implementation detail won't change in a hurry.
STL abstraction machinery changes as quickly as a STL does i.e. every
decade or so.

> > https://svn.boost.org/trac/boost/wiki/BestPracticeHandbook#a7.SAFETY:S
> > tronglyconsideranightlyorweeklyinputfuzzautomatedtestifyourlibraryisab
> > letoacceptuntrustedinput
>
> That's why this first implementation uses a proven-parser that is very well
> tested.

The buck always stops with the top most layer, not internally used
third party libraries.

You NEED to fuzz test untrusted inputs when parsing HTTP. For all you
know, your STL implementation could be the thing with the overflow
bug, or the way in which you use it.

> > * I don't understand why you cannot issue more than one async read at
> > a time nor async write at a time. In something like pipelined HTTP
> > that seems like a very common pattern. I smell a potential design
> > flaw, and while the docs mention a "queue_socket" I can't see such a
> > thing in the reference section.
> >
>
> Too much synchronization and buffering.
>
> The pipeline has the following requests:
>
> A => B => C
>
> If I were to allow replies to request B, the design would be more complex,
> increasing the difficult to implement alternative backends and everyone's
> life. Also, the reply to request B cannot actually be sent while the reply
> to request A is on hold. This means that the whole response to request B
> would need buffering and if response B is a video live stream, this means
> an implicit (the user cannot know) crash after an unfortunate memory
> exhaustion. I couldn't claim a **lightweight** server if this excessive
> buffering was to occur. I couldn't claim the project is appropriate for
> embedded devices if I had chosen the buffering approach. It's a trade-off.
> I should write about it in the "design choice" of the documentation.
>
> HTTP/2.0 does have proper multiplexing and it can be more useful to respond
> several requests without "head of line blocking".

Ok, let's accept that HTTP before 2.0 can't multiplex (not my
experience personally, but I agree it's got big gotchas before 2.0).

You need a user facing API for Http which lets user code multiplex
where that is available, and not multiplex when that is not
available. In other words, identical code written against Http must
automatically be optimal in either connection case.

You're probably going to ask how I might implement that right? I
think my first guess is you need two completion handler layers and
therefore two completion dispatcher reactors: the bottom layer is the
same as ASIO's and uses the ASIO reactor, whilst the top layer which
users of Http sees is a reactor operated by Http and is disconnected,
though operated by, the ASIO reactor.

> A queue socket is a concept, I don't provide one. I can look into how make
> the documentation less confusing by adding a page exclusively dedicated to
> explain the problem it solves. Would that work for you?

It could be this queue socket concept of yours is exactly what I just
proposed as two completion handler and reactor layers. In either
case, I don't think queue sockets should be a concept, I think they
need to be the core of your library's user facing API.

If power users wish to bypass that user facing API layer for less
hand holding and less latency, that should also be possible. But I
think HTTP multiplexing should be assumed to be the default starting
point for Http applications. Once HTTP 2.0 is more common, it will
seem madness that Http's user facing API is _not_ 100% multiplexing.

> * Your tutorial examples use "using namespace std;" at global level.
> > That's a showstopper of bad practice, and you need to purge all of
> > those. Same for "using namespace boost;" at global level.
> >
>
> The tutorials should be as small as possible to not scary the users.
>
> But I can put a comment "you should avoid using namespace". Does it help?

using namespace is fine at local scope. It's not fine at global
scope.

Replacing all global using namespace with local using namespace is
all you need to do.

> * Your reference section is one very long thin list. You might want
> > to use more horizontal space.
> >
>
> I'd be happy already if I knew hot to remove the TOC from the reference
> page. It'd be much more clean already.

Personally in AFIO I hack the CSS to make the list into columns. ASIO
employs a table. There are many solutions.

> * I personally found the design rationale not useful. I specifically
> > want to know:
> >
> > 1. Why did Http choose ASIO as my base over leading competitor X
> > and Y?
> > 2. What sacrifices occurred due to that choice? i.e. things I'd
> > love weren't the case.
> >
>
> I assume I don't need to reply these questions during the review, given the
> reviewers may already be experienced on the answers.

Well ... I'm not sure that's actually the case.

It's not we don't have opinions on the answers, it's whether our
answers are different to your answers. You'll see this during the
AFIO review later this month - I have some really weird opinions no
one else shares like choosing my own custom lightweight monadic
future-promise implementation over async_result. That will be the
source of much criticism I'm sure.

> 3. What alternatives may be better suited for the user examining
> > Http for suitability and why? i.e. make your documentation useful to
> > people with a problem to solve, not just a howto manual.
> >
>
> I don't quite understand this point.

What I'm really asking is for details of when before starting Http
you reviewed the options available and the designs they used, and
exactly why you chose the design choices in Http you did with
reference and comparision to prior art.

Like a literature review, but for programming libraries. Does this
make sense? It's partially for us to help us understand your choices,
but also to help those examining Http to see if it's useful to them
to figure out the best solution to their problem (which may be to use
an alternative to Http).

Documentation which is useful over documentation which is reference.

> * Benchmarks, especially latency ones which are important for many
> > uses of HTTP are missing. I'd particularly like to know confidence
> > intervals on the statistical distribution so I know worst case
> > outcomes.
> >
>
> These values will change when I replace the parser used now. Also, they can
> change based on the buffer sizes and parser options, which the user is/will
> be able to change.
>
> Nevertheless, I do want to do some serious benchmarking, specially because
> it'll give me a reference point when I write my own parser and want to
> compare performance speedup. I can do some poor benchmark this weekend and
> show the results.

That would be interesting, but don't go crazy on it.

Also, please do enable the undefined behaviour sanitiser permanently
including in release mode and with full stack smashing protection
before running benchmarks. Anything parsing malicious input should
have all those turned on where available.

> I hope all the above isn't too disheartening Vinícius.
>
> Not at all. I think you're the second person to dedicate more time
> reviewing my work and this isn't disheartening at all. Just that your
> feedback is usually around tooling and not the design itself (everybody has
> its preferences).

That's because I like the design a lot and have little bad to say
about it.

I've also been subscribed to your github since the beginning, and I
have been reviewing your code on and off as you wrote it. You're a
talented programmer.

> > Jamfile for the main library build,
>
> Consider this done. I won't integrate into Boost without removing CMake
> first. This was already partially stated in the Bjorn's announcement (but
> somehow implicit, to be fair).

I have no problem at all with dual build systems if you're happy
keeping both maintained.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/
http://ie.linkedin.com/in/nialldouglas/
Vinícius dos Santos Oliveira
2015-08-08 16:48:04 UTC
Permalink
2015-08-08 11:00 GMT-03:00 Niall Douglas <***@nedprod.com>:

> As a strong advocate for C++ 11 over 03 I can understand. However,
> IoT developers tend to be stuck on some pretty ancient toolsets for
> longer than others. 03 support I think would greatly increase your
> potential userbase.
>

I do care about C++03. That's why it's on the roadmap. But the design
wouldn't change noticeably and that's why I believe Boost.Http is ready for
review. Some of the requirements you're complaining about aren't Boost
requirements. Boost.Http isn't perfect now (and it'll take maybe a few
years before I exhaust my plans for improvements), but it has a strong core
that is already useful by itself. Lots of Boost libraries continued to
improve with time. It's not like you only do bugfixing after a library is
accepted.

The buck always stops with the top most layer, not internally used
> third party libraries.
>
> You NEED to fuzz test untrusted inputs when parsing HTTP. For all you
> know, your STL implementation could be the thing with the overflow
> bug, or the way in which you use it.
>

I opened an issue, so I won't forget about it:
https://github.com/BoostGSoC14/boost.http/issues/12

Ok, let's accept that HTTP before 2.0 can't multiplex (not my
> experience personally, but I agree it's got big gotchas before 2.0).
>
> You need a user facing API for Http which lets user code multiplex
> where that is available, and not multiplex when that is not
> available. In other words, identical code written against Http must
> automatically be optimal in either connection case.
>

Current design already supports concurrent requests. Each socket is a
communication channel with your application and every time you can handle
another request, you open a new socket where you'll do it. That's my plan
for HTTP 2.0. The advantage is that you don't need to associate any id with
every message (that's more lightweight and more portable against different
HTTP backends).

It could be this queue socket concept of yours is exactly what I just
> proposed as two completion handler and reactor layers. In either
> case, I don't think queue sockets should be a concept, I think they
> need to be the core of your library's user facing API.
>

And then you aren't following C++ rule number #1 anymore: You only pay for
what you use. That's why Asio itself doesn't solve this problem for you.
You can use boost::http::basic_socket<queue_socket> if you need to work
around Asio composed operations at this level. All customization points are
there for anyone.

The queue socket is mentioned in the first page (the front) of
documentation: https://boostgsoc14.github.io/boost.http/#boost_http.f0

Interesting links to follow:

- http://sourceforge.net/p/asio/mailman/message/32259256/
-
http://sourceforge.net/p/axiomq/code/ci/master/tree/include/axiomq/basic_queue_socket.hpp


> 3. What alternatives may be better suited for the user examining
> > > Http for suitability and why? i.e. make your documentation useful to
> > > people with a problem to solve, not just a howto manual.
> > >
> >
> > I don't quite understand this point.
>
> What I'm really asking is for details of when before starting Http
> you reviewed the options available and the designs they used, and
> exactly why you chose the design choices in Http you did with
> reference and comparision to prior art.
>
> Like a literature review, but for programming libraries. Does this
> make sense? It's partially for us to help us understand your choices,
> but also to help those examining Http to see if it's useful to them
> to figure out the best solution to their problem (which may be to use
> an alternative to Http).
>
> Documentation which is useful over documentation which is reference.
>

I see.

That's the entire point on the "design choices" chapter:
https://boostgsoc14.github.io/boost.http/design_choices.html

Of course I can always improve this chapter by explaining more and more.
This review is helping me gather more questions to answer, but I'm already
answering here. So you guys can just ask.



--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Niall Douglas
2015-08-11 17:35:38 UTC
Permalink
On 8 Aug 2015 at 13:48, Vinícius dos Santos Oliveira wrote:

> But the design
> wouldn't change noticeably and that's why I believe Boost.Http is ready for
> review. Some of the requirements you're complaining about aren't Boost
> requirements. Boost.Http isn't perfect now (and it'll take maybe a few
> years before I exhaust my plans for improvements), but it has a strong core
> that is already useful by itself. Lots of Boost libraries continued to
> improve with time. It's not like you only do bugfixing after a library is
> accepted.

All of this is true. And I am mindful I am complaining about a design
choice which is specifically that you are targeting Boost before
WG21.

However a Boost library is about intent and purpose and philosophy as
well as technical architecture and implementation. I personally
believe "Boost-ness" can mean a library should not be part of Boost
which is self-contradictory I know.

I appreciate that cleaving more closely to the Networking TS instead
of Boost.ASIO may seem like an internal implementation detail, not a
design detail. But it ticks the intent and purpose and philosophy
boxes for me and right now you're not ticking them. Plus, as I
mentioned, you get "free" Boost compatibility via the ASIO to
Boost.ASIO conversion scripts.

Where I'm really at is I think if Http is accepted you're going to
either have to ditch it and reimplement atop the Networking TS as
Chris folds the substantial changes WG21 will force onto ASIO into
Boost.ASIO, or end up refactoring Http to cleave more closely to the
Networking TS anyway.

I am therefore of the position it's more efficient to skip that and
just cleave right now to the Networking TS.

> The buck always stops with the top most layer, not internally used
> > third party libraries.
> >
> > You NEED to fuzz test untrusted inputs when parsing HTTP. For all you
> > know, your STL implementation could be the thing with the overflow
> > bug, or the way in which you use it.
> >
>
> I opened an issue, so I won't forget about it:
> https://github.com/BoostGSoC14/boost.http/issues/12

Thanks for that.

> > It could be this queue socket concept of yours is exactly what I
just
> > proposed as two completion handler and reactor layers. In either
> > case, I don't think queue sockets should be a concept, I think they
> > need to be the core of your library's user facing API.
> >
>
> And then you aren't following C++ rule number #1 anymore: You only pay for
> what you use. That's why Asio itself doesn't solve this problem for you.
> You can use boost::http::basic_socket<queue_socket> if you need to work
> around Asio composed operations at this level. All customization points are
> there for anyone.

I am finding myself unconvinced by your arguments here. What stands
in the way of a two layer API design? Bottom layer is racy but lowest
latency. Top layer is not racy, but adds some latency.

I think for the majority of HTTP users they just want it to work
without surprises to a high default performance level. If you look at
the history of the HTTP library support in Python you'll see what I
mean - firstly, it's surprisingly easy to get a HTTP library API
design wrong, even in a v2 refactor. And secondly that people need
both a stupid-simple API and a more bare metal API *simultaneously*
with HTTP, and therein lies the design gotcha.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/
http://ie.linkedin.com/in/nialldouglas/
Vinícius dos Santos Oliveira
2015-08-13 11:58:26 UTC
Permalink
First of all, sorry to all members of the list about my unavailability. I
was planing to write an experimental HTTP 2.0 backend, so I could give more
confidence about how much I believe this Boost.Http core I present for
review is the right abstraction.

Anyway, looks like I took the wrong approach. I should have answered the
easiest questions first and implement the HTTP 2.0 backend (experimental,
using existing-library and not Boost-quality) later.

2015-08-11 14:35 GMT-03:00 Niall Douglas <***@nedprod.com>:

> Where I'm really at is I think if Http is accepted you're going to
> either have to ditch it and reimplement atop the Networking TS as
> Chris folds the substantial changes WG21 will force onto ASIO into
> Boost.ASIO, or end up refactoring Http to cleave more closely to the
> Networking TS anyway.
>

I'm okay with that.

> And then you aren't following C++ rule number #1 anymore: You only pay for
> > what you use. That's why Asio itself doesn't solve this problem for you.
> > You can use boost::http::basic_socket<queue_socket> if you need to work
> > around Asio composed operations at this level. All customization points
> are
> > there for anyone.
>
> I am finding myself unconvinced by your arguments here. What stands
> in the way of a two layer API design? Bottom layer is racy but lowest
> latency. Top layer is not racy, but adds some latency.
>
> I think for the majority of HTTP users they just want it to work
> without surprises to a high default performance level. If you look at
> the history of the HTTP library support in Python you'll see what I
> mean - firstly, it's surprisingly easy to get a HTTP library API
> design wrong, even in a v2 refactor. And secondly that people need
> both a stupid-simple API and a more bare metal API *simultaneously*
> with HTTP, and therein lies the design gotcha.
>

If you want a high level API, you're going to use coroutines. There is
nothing in the wild so readable as coroutines. Coroutines are **the**
solution to solve spaghetti code in asynchronous abstractions. Lambdas and
futures will never be as readable as coroutines. Anyway...

If you want a high-level API, you're going to use coroutines and the use of
coroutines will already suspend your code until the completion of the
previous operation. You end up not scheduling too many operations at once
(less resources consumed) and you are using the API the right way.

If you want to not use coroutines and still have a somehow high-level API,
just change the underlying socket. It's not a problem. The only problem I
see here is the lack of a page just documenting composed operations given
the confusion that arose on this matter.

Really, you should NOT pay for what you do NOT use. Eventually we'll have
coroutines in the language, so you will be unable to even something more
efficient and will end up just using coroutines. And with the design I
propose, you won't even pay for scheduling/storing multiple operations that
just cannot be used right now. If the "pay for what I don't use" design was
what Boost wanted, I believe Boost.Asio would do different (fair enough
that Boost.Asio is a low-level library).

Now, about a real high-level API. Boost.Http is somewhat low-level, but not
because the reasons given. I can provide a higher-level API and it won't
change the points you're against. If you see the Boost.Http roadmap, you'll
notice where Boost.Http is really low-level (lack of requests router, form
parsing, HTTP session management...). This kind of stuff can go hugely
polemic and I think it's very unwisely to integrate it all at once. You'd
be like comparing frameworks that are completely different (like Python's
Django or Flask) and asking "hey, what is the **right** choice?". This kind
of question is really unhelpful. These frameworks continue to evolve and
sometimes they break API or even completely new approaches arise (like the
recent rise of popularity in web microframeworks). I'd rather provide
really generic and flexible building blocks than state that my view of web
development is correct. Not too long ago, LAMP was a very popular solution,
and this solution assumed you would use MySql database, but sometimes you
do not even need a database.

You should check the answer for "Why isn't a router available?" on the
Boost.Http FAQ: https://boostgsoc14.github.io/boost.http/design_choices.html

It's very polemic and I need to develop a NEW approach, not import the
design from some place or another. I need to reconcile current approaches
(I'd like to use the word paradigm here). Not only reconcile them, but I
need to allow some kind of collaboration. And it's C++, it's harder. It's
not harder because it's C++. It's harder because the C++ community takes
software development very seriously. And then, it's not just C++, it's
Boost, the small group from C++ that is know among the community as the
group who strives to deliver even higher quality software.

If we stick for what we need for now, I believe the correct question to
focus on is "If I need to communicate HTTP messages, what is the correct
approach?". It's what we need now, pass HTTP messages around. And then, the
HTTP protocol may not even be involved, that's why I focused so much on
allow alternative HTTP backends. Extremely detailed and careful
requirements like the ones written for Message[1] and ServerSocket[2] is
not something you'll see anyone doing. And it takes a lot of care because
you need to be careful who you're excluding. I choose to not exclude HTTP
1.0 and I've put HTTP upgrade and HTTP chunking as optional features that
the user must check. I also choose to not exclude alternative backends. I
also choose to allow **lightweight** implementations, so embedded devices
would be left out. cpp-netlib and pion have their own thread pools and
aren't very friendly to embedded devices. I also go fully async, allowing
really really fine-grained control by the user. Even trying to be so
ambitious, I'd not say that the design went so low-level that becomes
unusable, as the spawn example proves[3] (163 lines and a good part is
because Boost.Asio boilerplate, not Boost.Http).

I have some experience with HTTP libraries (one of the most important being
the Tufão project[4]) and I acquired experience. This experience I have is
what makes me capable of judging some design decisions that can be a
mistake. Not considering alternative HTTP backends from the beginning being
one of them. Other mistakes being less obvious. The Node.js API is really
trick (implicit decision on whether use chunking or not) to get right if
you're concerned with portability among applications (there is an old and
fixed Tufão bug related to just this[5]). The hush to get high-level APIs
can also become a problem because you can very very easily lose the
possibility to do fine-grained adjustments. First you're fine with
single-threaded, and then you want the handling being split into threads.
Then you want not split the responsibility to split the connections among
threads, but each pair of request-response with a scheduler that is clever
than round-robin[6]. Eventually you'll end up losing the interoperability
and having to rewrite large parts of the application.

I'm very concerned about interoperability, that's why I mentioned it in the
very initial GSoC proposal last year, along the following lines:

"In fact, there is a lot of higher-level abstractions competing with each
> other, providing mostly incompatible abstractions. By not targeting this
> field, this library can actually become a new base for all these
> higher-level abstractions and allow greater interoperability among them."
> -- https://github.com/vinipsmaker/gsoc2014-boost#non-goals
>

The more I think, the more I believe how much such "middle-level" API is
underestimated.

I mentioned at random places how much I appreciate the message-oriented
approach I'm using. Now I think I should have dedicated a whole chapter on
the topic. If it weren't for this message-oriented approach, the design
would be much more complex. It'd be like trying to solve problems of a
high-level API (set_timeout, set_scheduler, set_handler_factory,
set_allocator, set_pool, set_feature_xyz) at this level already and
thinking really hard to not miss **ANY** feature that the user could
possibly want to customize.

The message-oriented approach has a single-first-immediate impact:
Communication channels and message representations are decoupled.

Without this simple separation, you do not need to complicate communication
channels so much. And you also gain the ability to use your own allocators,
pools, data structures, non-allocating buffers and so on. I've read some
messages about people concerned that the API is too level and even still
will allocate sometimes. All the examples I've written allocate and the
reason is that all the examples I wrote don't have a bounded limit of HTTP
connections (there isn't a single big chunk of stack-allocated
pool/buffer/...). A new "handler" will be created as a new connection
appears. But if you put a limit, implement your own data structures and
read the documentation (it doesn't even need to be an extremely careful
read), you're done. There are some gritty decisions on implementation
details that made me allocate at some spots (use of functors, lack of
dynarray...), but then I should discuss the implementation and not mix with
API design (unless you propose an interface to specify allocators).

It's not really like Boost.Http is too low level. It's more like only the
less polemic building blocks are available now. You'll see the same main
players in a higher-level abstraction and the main difference is most
likely that you won't manage the main players yourself.

And advocate for a Boost.Http instead a Boost.NetExtensions because I allow
alternative backends. Asio will always be involved because the pure async
nature, but you might not even use network and use an alternative backend
that communicate using shared memory and other means.

There are many more thoughts, but I think I'm diverging too much from
Niall's concern, so I'll stop here and answer the rest of the questions. If
you guys have any question to specific points, just raise them.

[1] https://boostgsoc14.github.io/boost.http/reference/message_concept.html
[2]
https://boostgsoc14.github.io/boost.http/reference/server_socket_concept.html
[3]
https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621aedf75521556/example/spawn.cpp
[4] https://github.com/vinipsmaker/tufao
[5] https://github.com/vinipsmaker/tufao/issues/41
[6] https://en.wikipedia.org/wiki/Round-robin_scheduling

--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Niall Douglas
2015-08-15 12:06:12 UTC
Permalink
On 13 Aug 2015 at 8:58, Vinícius dos Santos Oliveira wrote:

> First of all, sorry to all members of the list about my unavailability. I
> was planing to write an experimental HTTP 2.0 backend, so I could give more
> confidence about how much I believe this Boost.Http core I present for
> review is the right abstraction.
>
> Anyway, looks like I took the wrong approach. I should have answered the
> easiest questions first and implement the HTTP 2.0 backend (experimental,
> using existing-library and not Boost-quality) later.

That's a wise move Vinícius I think.

> > Where I'm really at is I think if Http is accepted you're going to
> > either have to ditch it and reimplement atop the Networking TS as
> > Chris folds the substantial changes WG21 will force onto ASIO into
> > Boost.ASIO, or end up refactoring Http to cleave more closely to the
> > Networking TS anyway.
>
> I'm okay with that.

You may be, but I would worry about your users. Boost users expect
stability and permanence.

> > I think for the majority of HTTP users they just want it to work
> > without surprises to a high default performance level. If you look at
> > the history of the HTTP library support in Python you'll see what I
> > mean - firstly, it's surprisingly easy to get a HTTP library API
> > design wrong, even in a v2 refactor. And secondly that people need
> > both a stupid-simple API and a more bare metal API *simultaneously*
> > with HTTP, and therein lies the design gotcha.
> >
>
> If you want a high level API, you're going to use coroutines. There is
> nothing in the wild so readable as coroutines. Coroutines are **the**
> solution to solve spaghetti code in asynchronous abstractions. Lambdas and
> futures will never be as readable as coroutines. Anyway...

God no. I would see a top level API as being synchronous, so
something like python's urllib3:

import requests
r = requests.get('http://www.boost.org')
print(r.json())

It may coroutinise but still appear to be synchronous, but that's a
separate moving ball and not one Http can tackle alone.

> The more I think, the more I believe how much such "middle-level" API is
> underestimated.
> [snip]
> It's not really like Boost.Http is too low level. It's more like only the
> less polemic building blocks are available now. You'll see the same main
> players in a higher-level abstraction and the main difference is most
> likely that you won't manage the main players yourself.

Think of it as a Boost user. The average Boost user is going to see
Boost.Http and think to themselves, "oh goody I can now write:

import requests
r = requests.get('http://www.boost.org')
print(r.json())

... in C++ and yay I've just chopped 10,000 lines of buggy hack code
from my codebase."

And that's exactly what I would expect too from a library called
Boost.Http.

What you're providing isn't that though: it's a set of low level HTTP
related extensions to Boost.ASIO. From my review of your code and
docs, and what you've told me, I think the ideal location for your
library is within ASIO under the asio::http namespace. However,
assuming Chris isn't keen to allow that as a git submodule within
ASIO (he might be, I don't know), I think as a very minimum you need
a name for your library which says it is a set of low level HTTP
related primitives for Boost.ASIO. That name is not Boost.Http for
sure. A later library you might submit could be called Boost.Http,
but it would provide a daft easy API like the above for Python's
urllib3.

In fact, if you can achieve a more appropriate name choice, I'll
change my vote from rejection to a conditional acceptance as you've
done a great job in persuading me from where I was. You've impressed
with me your awareness of the tradeoffs involved in a moving API
environment, and though I severely personally disagree with your
choice of Boost.ASIO over ASIO, I can also see that targeting a more
stable API in the form of Boost.ASIO for now makes sense.

How does that sound?

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/
http://ie.linkedin.com/in/nialldouglas/
Vinícius dos Santos Oliveira
2015-08-15 16:20:31 UTC
Permalink
2015-08-15 9:06 GMT-03:00 Niall Douglas <***@nedprod.com>:

> In fact, if you can achieve a more appropriate name choice, I'll
> change my vote from rejection to a conditional acceptance as you've
> done a great job in persuading me from where I was. You've impressed
> with me your awareness of the tradeoffs involved in a moving API
> environment, and though I severely personally disagree with your
> choice of Boost.ASIO over ASIO, I can also see that targeting a more
> stable API in the form of Boost.ASIO for now makes sense.
>
> How does that sound?
>

It sounds encouraging. I'm not that resistant to a change of name, but it
depends on the new name. I think I'd regret if I accept a name like
lowlevelhttp (or one that resembles the same idea) and the library gets
stuck with this image forever, even when I implement the high level APIs.


--
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker
Bjorn Reese
2015-08-15 08:01:59 UTC
Permalink
A quick reminder that the formal review ends on Sunday (August 16th.)

Please consider submitting a review where the following questions are
answered:

1. Should Boost.Http be accepted into Boost? Please state all
conditions for acceptance explicity.

2. What is your evaluation of the design?

3. What is your evaluation of the implementation?

4. What is your evaluation of the documentation?

5. What is your evaluation of the potential usefulness of the library?

6. Did you try to use the library? With what compiler? Did you have
any problems?

7. How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
Lee Clagett
2015-08-17 05:01:17 UTC
Permalink
On Fri, Aug 7, 2015 at 3:08 AM, Bjorn Reese <***@mail1.stofanet.dk>
wrote:

> Dear Boost and ASIO communities.
>
> The formal review of Vinícius dos Santos Oliveira's Boost.Http library
> starts today, August 7th and ends on Sunday August 16th.
>
> Boost.Http is designed for various forms of modern HTTP interaction,
> from normal HTTP request, over HTTP chunking and pipelining, to
> upgrading to other web protocols like WebSocket. This library builds
> on top of Boost.ASIO, and follows the threading model of ASIO.
>
> The two basic building-blocks are http::socket, which is socket that
> talks HTTP, and http::message with contains HTTP meta-data and body
> information. You can use these building-blocks to build a HTTP server
> that fits your exact needs; for instance, an embedded HTTP server for
> a ReST API. Boost.Http comes with a light-weight HTTP server and a
> static file server.
>
> Currently, Boost.Http is limited to server-side interaction, but the
> design principles used extends to client-side as well.
>
> Boost.Http was originally developed as part of GSoC 2014 and Vinícius
> has continued to develop and improve the library since then.
>
> The documentation can be found here:
>
> http://boostgsoc14.github.io/boost.http/
>
> The source code can be found here:
>
> https://github.com/BoostGSoC14/boost.http
>
> The source code is build using CMake, but Boost.Build is in the pipeline
> (already done for documentation.)
>
> Please answer the following questions:
>
> 1. Should Boost.Http be accepted into Boost? Please state all
> conditions for acceptance explicity.
>

I do not think Boost.Http should be accepted in its current state. My main
reasons (read portions below for expanded comments/thoughts):

(1) Incomplete design - There is no built-in mechanism for writing out
client headers, which means there is no client support.
(2) Unproven design - A low-level abstraction for a "HttpSocket" is
provided, but only two higher-order methods for file serving are provided.
Why aren't there higher-order functions for more common use cases?
(3) More considerations for design choices (or documentation wording)
should be considered - All data reads / writes for the message body require
a copy to be made.
(4) More time is needed to think about the design patterns (this follows
from a lack of (2)) - why is `async_write_response_continue` a requirement
for the http::ServerSocket concept?
(5) Documentation can be better organized and lacks critical information in
a few areas.


2. What is your evaluation of the design?
>

- There is no built-in mechanism for writing out client headers, which
means there is no client support. The provided rationale is that it took a
long-time to design the server portion, and the client side will take
equally as long. I don't see this as a valid argument; several current
library authors have indicated that providing a "boost-ready" library is an
incredible amount of work. I would expect a boost::http library to be more
feature complete before being accepted.

- An abstraction for a "HttpSocket" is provided, but only a few
higher-order functions are provided (for file serving). The current
lower-level abstractions _should_ allow for useful utilities to be written,
but I would like to see more functionality implemented before boost
acceptance, so the lower-level design is more "proven".

- The http::ServerSocket concept defines methods for writing response
headers, writing the body, writing trailers, and writing an entire response
message at once. This concept provides everything necessary for writing a
valid HTTP response. So why is `async_write_response_continue` mandated in
the http::ServerSocket concept when a free function can be implemented,
that uses the other concept functions? If the reasoning is performance
(which appears to be the reason) - why isn't a more performant method
(probably some form of pre-generation) of sending messages added? This
would prevent the socket concept from continually absorbing additional
special case messages as higher order functions are developed, and allow
for clients to pre-generate error responses.

- `async_read_request`, and `async_write_response_metadata` in the
http::ServerSocket concept require an entire http::Message concept as an
argument, when only the AssociativeContainer concept should be necessary
for this function.

- `async_write` and `async_read_some` in the http::Socket concept requires
an entire http::Message concept as an argument, when only the
SequenceContainer concept should be necessary for this function.

- All data reads / writes for the message body require a copy to be made,
because memory from a SequenceContainer concept cannot be given to ASIO/OS
directly (std::list<std::uint8_t> is a SequenceContainer). The message body
should likely be restricted to the C++1z ContiguousContainer concept
instead. I think users would prefer speed here more than flexibility. Its
also easier to relax requirements after release than the opposite.

- Ideally the read / write functions for the payload would have the same
requirements as an asio buffer. Requiring a container prevents memory
locations from other sources (such as directly from a streambuf), and
prevents scatter gather i/o. I'd say this is less critical, as its
something most users are unlikely to use.

- `async_read_trailers` and `async_writer_trailers` in the http::Socket
concept require an entire http::Message concept as an argument, when only
the AssociativeContainer concept should be necessary.

- I mentioned this before; client and server specific header
writing/reading is a shame due to the overwhelming similarities. My earlier
suggestions were criticized for being stringly typed (or having more
states), but I think this was a somewhat weak criticism since the type
system can be used more effectively.
`async_write_header(http::response{200, "OK"}, ...)` or
`async_write_header(http::request{http::request::GET, "/"}, ...)`.

- It might be worth eventually relaxing the requirements for the key/values
in the AssociativeMap used by the field parsing functions to a subset of
string functions. begin(), end(), size(), empty(), push_back(), pop_back(),
operator<(), a hash implementation, and contigous data. This way a quick
and dirty small string optimization could be written for field handling.

- What is the purpose of the standalone `async_write_response` and
`async_write_response_metadata` functions? They only forward to functions
with the same name in the http::ServerSocket concept, is this cruft?

- I like that all of the asynchronous callbacks take consistent parameters.
asio::coroutine works nicely.



> 3. What is your evaluation of the implementation?
>
>
Overall the quality is good. I didn't go read the code thoroughly, but I
did notice a few things -

- The current `async_write_response` implementation assumes contiguous
elements on the SequenceContainer concept, which is incorrect.

- A std::ostringstream is used, when spirit::karma would be faster and have
less memory allocations (minor).

- The current `async_read_some` implementation has an error code path that
can initiate a write. This is not documented anywhere, and effectively
issuing an `async_read_some` command blocks both the send and receive
channels of the underlying ASIO stream. This problem is likely to occur in
a few other spots, but I didn't check.

- There are some error code paths in async operations that modify the
entire message object (clear() is invoked on all parts of the message).
Seems unnecessary, and is at least undocumented.


4. What is your evaluation of the documentation?
>

Needs better organization, and someone to edit the rationale section in
particular.

- The reference section is organized strangely. Every class, concept,
function, and file is listed under the same header, and then its broken
into sections at the bottom. I didn't notice the bottom portion at first,
and it should replace the top portion that lacks organization.

- How do the various write functions manipulate the headers? What fields
are added, if any? This is partially mentioned at the bottom of the
http::ServerSocket page, but I saw "content-length: 22" added to my
message, and this was never explicitly stated (although obviously assumed).
This should likely be explicitly specified in the concept itself, AND the
location should be specified (i.e. these fields are explicitly appended).
Should also document that some fields, such as content-length cannot be
listed twice according to the specification, while comma de-limited fields
can be listed multiple times as a valid form of appension. I.e.
"content-encoding: gzip, sdch\r\n" and "content-encoding:
gzip\r\ncontent-encoding: sdch\r\n" indicate equivalent content encodings
applied to the data. Should probably mention that Transfer-Encoding is a
comma delimited list too.

- The `key_type` and `mapped_type` for the headers portion of the
message::Concept indicate that they must meet the "String concept". I don't
recall seeing this concept being defined in C++ or boost, where does it
come from? If its from C++1z, it might be to merge the behavior of
string_ref and basic_string, so that concept would be unlikely to require
storage like a container. Might want to re-think the concept requirements
here, or state that only a conforming std::basic_string implementation is
acceptable for now.

- Mentioned previously - the documentation for basic_socket should state
when a function is using the read or write portion of the stream. Currently
some functions use both unexpectedly, such as async_read_some.



> 5. What is your evaluation of the potential usefulness of the library?
>
>
This library has multiple potentially use cases, especially if a good
low-level design is provided. Additional higher-level functions would be
nice, because many (perhaps most) users have similar requirements, don't
need extreme performance, and just want something simple and relatively
efficient. Hopefully work will be continued on this library, regardless of
the outcome of this review process.


6. Did you try to use the library? With what compiler? Did you have
> any problems?
>
>
I ran all of the tests on OSX, which had no complaints. I also wrote a
quick asio::coroutine implementation, and viewed the data passing by in
wireshark for good measure.



> 7. How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
>

I spent a good amount of time reading the documentation, and
implementation. A little less testing / running code.



> 8. Are you knowledgeable about the problem domain?
>

I have fairly extensive knowledge of protocols, and experience providing
implementations for protocols (including HTTP). I also have a reasonable
amount of ASIO experience.


Lee
Lee Clagett
2015-08-25 21:44:48 UTC
Permalink
On Tue, Aug 25, 2015 at 1:43 AM, Vinícius dos Santos Oliveira <
***@gmail.com> wrote:

> Thanks for your review Lee, your comments are surely helpful for me, as I'm
> going to do several improvements based on them.
>
> 2015-08-17 2:01 GMT-03:00 Lee Clagett <***@leeclagett.com>:
>
> > So why is `async_write_response_continue` mandated in the
> > http::ServerSocket concept when a free function can be implemented, that
> > uses the other concept functions? If the reasoning is performance (which
> > appears to be the reason) - why isn't a more performant method (probably
> > some form of pre-generation) of sending messages added? This would
> prevent
> > the socket concept from continually absorbing additional special case
> > messages as higher order functions are developed, and allow for clients
> to
> > pre-generate error responses.
> >
>
> `async_write_response_continue` is a different concept, it cannot be
> composed in terms of the other operations. It has a different semantic. It
> changes the read_state status.
>

Did you mean write_state (if not, update documentation)? And yes, I missed
this state transition when making this suggestion.

The purpose of the state appears to be (and please correct me otherwise) is
to prevent consecutive 100 continue responses from being written. I don't
think its worth the extra state transition because the read calls do not
put the write_state into a special state that forces a 100 continue OR
error response. So a user of the library already has to know to check for
expect-100's in the request, at which point it seems reasonable to expect
the user to NOT write consecutive 100-continues.

My suggestion would be to either drop the state entirely (allowing the user
to "accidentally" write consecutive 100-continues) OR allow a write_state
change from a server header read into a expect-100-continue-response state.
The latter change may seem unpalatable, but this implementation is already
"locking" the write side of the channel during server reads to hide certain
error responses from users (bad http version, etc.).



> - `async_read_request`, and `async_write_response_metadata` in the
> > http::ServerSocket concept require an entire http::Message concept as an
> > argument, when only the AssociativeContainer concept should be necessary
> > for this function.
> >
> > - `async_write` and `async_read_some` in the http::Socket concept
> > requires an entire http::Message concept as an argument, when only the
> > SequenceContainer concept should be necessary for this function.
> >
>
> Just passing the message object is less error-prone (e.g. should I pass
> headers() or trailers()?). But yes, I could do better (decrease
> requirements and add a few helper functions/types).
>
> - All data reads / writes for the message body require a copy to be made,
> > because memory from a SequenceContainer concept cannot be given to
> ASIO/OS
> > directly (std::list<std::uint8_t> is a SequenceContainer). The message
> body
> > should likely be restricted to the C++1z ContiguousContainer concept
> > instead. I think users would prefer speed here more than flexibility. Its
> > also easier to relax requirements after release than the opposite.
> >
>
> std::vector is used by default. Not sure how helpful it is to restrict the
> user if he is willing to pay. A parser embedded in the message object would
> have trouble implementing the ContiguousContainer concept when it comes to
> chunked entities.
>
> I agree that's easier to relax requirements after release than the
> opposite.
>

A trait dispatched system could be provided if someone really wanted
SequenceContainer support. But its just too odd - std::list<std::uint8_t>
and other node based containers would be really low performing. Again, the
ASIO method is more flexible because it allows for contiguous chunks of
memory, or a range of contiguous chunks of memory to be provided. So a
std::list<std::uint8_t> should be supported by the ASIO interface actually
(oh my!), but obviously a range of 1-byte chunks is also likely poor
performing.



>
> - Ideally the read / write functions for the payload would have the same
> > requirements as an asio buffer. Requiring a container prevents memory
> > locations from other sources (such as directly from a streambuf), and
> > prevents scatter gather i/o. I'd say this is less critical, as its
> > something most users are unlikely to use.
> >
>
> It may kill some use cases. If some type is more performant for the user,
> it should be used. The default should satisfy most of the users wish.
>
>
What use cases does a container provide over an ASIO buffer? The only one I
can recall being mentioned is a wire & transport implementation that
sends/receives data as messages instead of a stream. I suspect the
flexibility of the ASIO buffer would appeal to more people than the ability
to read ZeroMQ messages with a slight efficiency advantage. I obviously
can't back this claim with any statistics, so I won't push much further on
this topic.

Are there other advantages or special use cases for the container-based
system? Have you considered standalone functions for writing/reading
standard HTTP/1.x chunked data directly from an ASIO stream? There might be
ways to allow effective usage of the native handle for advanced users (thus
keeping the default container implementation).



> - `async_read_trailers` and `async_writer_trailers` in the http::Socket
> > concept require an entire http::Message concept as an argument, when only
> > the AssociativeContainer concept should be necessary.
> >
>
> Just passing the message object is less error-prone (e.g. should I pass
> headers() or trailers()?). But yes, I could do better (decrease
> requirements and add a few helper functions/types).
>
> - I mentioned this before; client and server specific header
> > writing/reading is a shame due to the overwhelming similarities. My
> earlier
> > suggestions were criticized for being stringly typed (or having more
> > states), but I think this was a somewhat weak criticism since the type
> > system can be used more effectively.
> > `async_write_header(http::response{200, "OK"}, ...)` or
> > `async_write_header(http::request{http::request::GET, "/"}, ...)`.
> >
>
> This design would have much more implicit behaviour and it'd lead to more
> surprises. A ResponseBuilder or alike could be done, separately, in case
> you don't mind the surprises.
>

What do you mean by implicit behavior? And what surprises? The name of the
type indicates the action being taken ... ?



>
> - What is the purpose of the standalone `async_write_response` and
> > `async_write_response_metadata` functions? They only forward to functions
> > with the same name in the http::ServerSocket concept, is this cruft?
> >
>
> They fill the reason phrase for you based on the status code.
>
> - The current `async_read_some` implementation has an error code path that
> > can initiate a write. This is not documented anywhere, and effectively
> > issuing an `async_read_some` command blocks both the send and receive
> > channels of the underlying ASIO stream. This problem is likely to occur
> in
> > a few other spots, but I didn't check.
> >
>
> The ServerSocket concept gives guarantees about a producer to HTTP messages
> (and you, as the user, writes the HTTP consumer). A channel to exchange
> HTTP messages somehow. There were no guarantees about exposing the
> implementation detail errors to the user.
>
> Imagine a ZeroMQ-based ServerSocket that cannot send "pings" because it was
> not documented in the general concept. The general ServerSocket concept
> doesn't restrict the freedom from HTTP "providers" too much.
>
>
The general concept needs to stipulate whether reads or writes can be made
by the client after initiating an asynchronous operation. This is the most
important thing that must be documented, because the current implementation
cannot safely accept a write operation when some (all?) of the read
operations are still in progress. I think its acceptable for an
implementation to inject control messages under such restrictions, as long
as the user can still issue their own write when the specification allows
for it. Such implementations _may_ have to indicate the control message
behavior when documenting access to the "native handle" (and I suspect
"may" is going to be "always" in practice), in case a user tries to do some
raw writing. Or maybe the documentation just says "never do raw
read/writes" with the native handle.


Anyway, this is a serious documentation error within the library and I need
> to fix it.
>
>
This isn't just a serious documentation error, its a design problem for
delivering lower-latency responses. Since the read call can issue an
automatic error message on the write channel, users of the library cannot
begin reading the next request (which is likely already in a kernel buffer
if the client is pipelining requests) until the prior response has finished
sending. If the errors were not automatically written by the
implementation, a user could began reading the next request (potentially
even preparing DB calls, etc.) while waiting for the response of the prior
request to complete.

I do not think this is a fatal issue, but it does make me question the
low-level design goals. The design/implementation seems to be halfway
between a constricting easy-to-use high-level design and a performant
low-level design. It seems reasonable to expect users of the low-level
interface to have more understanding of HTTP, while the high-level design
provides a better "safety net" for users. The low-level users would still
benefit from an HTTP system that reads/writes headers into/from an
associative container, handles receiving/sending chunked data
automatically, etc., which is convenient.

Lee
Continue reading on narkive:
Loading...