Cloud Pub/Sub

feature summary

Message persist for 7 days
low-lantency
Capturing data and distributing data
Unified global server less service - not attached to a specific project, domain or user
Smooth out traffic spikes or bursty communications
Balancing the messages
Autoscales to deal with variable volumes of data
Simplifies the distribution of events
Decouple the publisher and subscriber
Not just streaming, also support batch???

A subscriber ACKs each message for every subscription
A message is resent if subscriber takes more than “ackDeadline” to respond
A subscriber can extends the deadline per message

Order is not guaranteed
System need to deal with “out of order” and “possible duplication”
Dataflow can handle both, by order and window?

Fan in / Fan out

Push

Initiate a request to the subscriber application to deliver messages.
The subscriber needs to be an HTTPS web application, needs to have a web hook that is accessible via HTTPS, needs to be a server application
No delay - ideal if you want low latency, immediate processing of messages, close to real time performance
If cannot reach the subscriber - do a reduction exponential backoff- tries, increase the delay and try again. You can control the retry, it will retry for seven days.

pull

The subscriber ask pubsub
Just need to make a call
There will be a delay, because the subscriber is check the periodically.
If good for if you have lots of subscribers, and those subscribers are dymaticlly created.
The subscriber explicitly request the delivery of message that’s in the subscription queue, it return error if the queue is empty. And it reposed with ackID.

Dataflow can deduplicate by a provided ID
Can define your own timestamp for watermark

Bigquery streaming ingestion:
100 k rows / per table per second

Dataflow running all the time, BigQuery for ad hoc

pubsub